Combinatorics is a subject of increasing importance, owing to its 
links with computer science, statistics and algebra, This is a textbook 
aimed at second-year undergraduates to beginning graduates. It 
stresses common techniques (such as generating functions and 
recursive construction) which underlie the great variety of subject 
matter, and the fact that a constructive or algorithmic proof is more 
valuable than an existence proof. 

The book is divided into two parts, the second at a higher level and 
with a wider range than the first. Historical notes are included and 
give a wider perspective on the subject. More adyanced topics are 
given as projects, and there are a number of exercises, some with 
solutions given. 
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Preface 


Ive got to work the E qwations and the low cations 
Ive got to comb the nations of it. 


Russell Hoban, Riddley Walker (1980) 


We have not begun to understand the relationship between combinatorics and 
conceptual mathematics. 


J. Dieudonné, A Panorama of Pure Mathematics (1982) 


If anything at all can be deduced from the two quotations at the top of this page, 
perhaps it is this: Combinatorics is an essential part of the human spirit; but it is 
a difficult subject for the abstract, axiomatising Bourbaki school of mathematics to 
comprehend. Nevertheless, the advent of computers and electronic communications 
have made it a more important subject than ever. 

This is a textbook on combinatorics. It’s based on my experience of more than 
twenty years of research and, more specifically, on teaching a course at Queen Mary 
and Westfield College, University of London, since 1986. The book presupposes 
some mathematical knowledge. The first part (Chapters 2-11) could be studied by 
a second-year British undergraduate; but I hope that more advanced students will 
find something interesting here too (especially in the Projects, which may be skipped 
without much loss by beginners). The second half (Chapters 12-20) is in a more 
condensed style, more suited to postgraduate students. 

I am grateful to many colleagues, friends and students for all kinds of contribu- 
tions, some of which are acknowledged in the text; and to Neill Cameron, for the 
illustration on p. 128. 

I have not provided a table of dependencies between chapters. Everything is 
connected; but combinatorics is, by nature, broad rather than deep. The more 
important connections are indicated at the start of the chapters. 


Peter J. Cameron 
17 March 1994 


1. What is Combinatorics? 


Combinatorics is the slums of topology. 
J. H. C. Whitehead (attr.)! 


| have to admit that he was not bad at combinatorial analysis — a branch, 
however, that even then | considered to be dried up. 


Stanislaw Lem, His Master's Voice (1968) 


Combinatorics is special. Most mathematical topics which can be covered in a 
lecture course build towards a single, well-defined goal, such as Cauchy’s Theorem 
or the Prime Number Theorem. Even if such a clear goal doesn’t exist, there is 
a sharp focus (finite groups, perhaps, or non-parametric statistics). By contrast, 
combinatorics appears to be a collection of unrelated puzzles chosen at random. 

Two factors contribute to this. First, combinatorics is broad rather than deep. 
Its tentacles stretch into virtually all corners of mathematics. Second, it is about 
techniques rather than results. As in a net,” threads run through the entire con- 
struction, appearing unexpectedly far from where we last saw them. A treatment of 
combinatorics which neglects this is bound to give a superficial impression. 

This feature makes the teacher’s job harder. Reading, or lecturing, is inherently 
one-dimensional. If we follow one thread, we miss the essential interconnectedness 
of the subject. 

I have attempted to meet this difficulty by various devices. Each chapter begins 
with a list of topics, techniques, and algorithms considered in the chapter, and 
cross-references to other chapters. Also, some of the material is set in smaller 
type and can be regarded as optional. This usually includes a ‘project’ involving a 
more difficult proof or construction (where the arguments may only be sketched, 
requiring extra work by the reader). These projects could be used for presentations 
by students. Finally, the book is divided into two parts; the second part treats topics 
in greater depth, and the pace hots up a bit (though, I hope, not at the expense of 
intelligibility). 

As just noted, there are algorithms scattered throughout the book. These are not 
computer programs, but descriptions in English of how a computation is performed. 
I hope that they can be turned into computer programs or subroutines by readers 
with programming experience. The point is that an explicit construction of an object 
usually tells us more than a non-constructive existence proof. (Examples will be 
given to illustrate this.) An algorithm resembles a theorem in that it requires a proof 
(not of the algorithm itself, but of the fact that it does what is claimed of it). 


L This attribution is due to Graham Higman, who revised Whitehead’s definition to ‘Combinatorics 
is the mews of algebra.’ 

2 iNet. Anything reticulated or decussated at equal distances, with interstices between the intersec- 
tions.’ Samuel Johnson, Dictionary of the English Language (1775). 


1. What is Combinatorics? 


But what is combinatorics? Why should you read further? 

Combinatorics could be described as the art of arranging objects according 
to specified rules. We want to know, first, whether a particular arrangement is 
possible at all, and if so, in how many different ways it can be done. If the rules 
are simple (like picking a cricket team from a class of schoolboys), the existence 
of an arrangement is clear, and we concentrate on the counting problem. But for 
more involved rules, it may not be clear whether the arrangement is possible at all. 
Examples are Kirkman’s schoolgirls and Euler’s officers, described below. 


Sample problems 


In this section, I will give four examples of combinatorial questions chosen to 
illustrate the nature of the subject, Each of these will be discussed later in the book. 


Derangements 


Given n letters and n addressed envelopes, in how many ways can 
the letters be placed in the envelopes so that no letter is in the 
correct envelope? 


Discussion. The total number of ways of putting the letters in the envelopes is the 
number of permutations of n objects,? which is n! (factorial n). We will see that 
the fraction of these which are all incorrectly addressed is very close to 1/e, where 
e = 2.71828... is the base of natural logarithms — a surprising result at first sight. 
In fact, the exact number of ways of mis-addressing all the letters is the nearest 
integer to n!/e (see Exercise 1). 


Kirkman’s schoolgirls 


Fifteen schoolgirls walk each day in five groups of three. Arrange 
the girls’ walks for a week so that, in that time, each pair of girls 
walks together in a group just once. 


Discussion. If it is possible at all, seven days will be required. For any given 
girl must walk once with each of the other fourteen; and each day she walks with 
two others. However, showing that the walks are actually possible requires more 
argument. The question was posed and solved by Kirkman im 1847, The same 
question could be asked for other numbers of girls (see Exercise 2). Only in 1967 
did Ray-Chaudhuri and Wilson show that solutions exist for any number of girls 
congruent to 3 modulo 6. 


Euler’s officers 


Thirty-six officers are given, belonging to six regiments and holding 
six ranks (so that each combination of rank and regiment cor- 
responds to just one officer). Can the officers be paraded in a 
6 x 6 azray so that, in any line (row or column) of the array, each 
regiment and each rank occurs precisely once? 


3 Permutations will be described in Chapter 3. 


How to use this book 3 
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Discussion. Euler posed this problem in 1782; he believed that the answer was re 
This was not proved until 1900, by Tarry. Again, the problem can e gener ise ate 
n? officers, where the number of regiments, ranks, tows and columns is n (we ass name 
n > 1) — see Exercise 3, There is no solution for n = 2. Euler new so ui ti ns for, 
all n not congruent to 2 modulo 4, and guessed that there was no so ution or =i 
(mod 4). However, he was wrong about that. Bose, Shrikhande and Parker sho 
in 1960 that there is a solution for all n except n = 2 and n = ô. 


A Ramsey game 


This two-player game requires a sheet of paper and pencils of we 
colours, say red and blue. Six points on the paper are chosen, wi 
no three in line. Now the players take a pencil each, and take turns 
drawing a line connecting two of the chosen points. The first p ayer 
to complete a triangle of her own colour loses. (Only triangles wi 
vertices at the chosen points count.) 

Can the game ever result in a draw? 


Discussion. We'll see that a draw is not possible; one or other player will be forced 
to create a triangle. Ramsey proved a wide generalisation of this fact. His theorem 
is sometimes stated in the form ‘Complete disorder is impossible. 


How to use this book 


1. The book is divided into two parts: Chapters 2-11 and Chapters et In ihe 
second part, along with some new material, we revisit many of the topics rom the 
first part and treat them from a more advanced viewpoint; also, ss I me tioned 
earlier, the pace is a little faster in the second part, In any case a TSt ¢ arse can be 
devised using only the first part of the book. (The second-thir l yeaz un e gra pate 
course at Queen Mary and Westfield College includes a selection ° inet re 
Chapters 3 (Sections 3.1, 3.2, 3.3, 3.5, 3.7, 3.11, 3.12), 4 (Sections tbt ; .4, 4.5), 5, 
6, 7, 8 and 10; other courses treat material from Chapters 9, 11, . 


2. Chapter 3 plays a special réle. The material here is central to combinatorics: 
subsets partitions, and permutations of finite sets. Within the other chapters, you are 
encouraged to dabble, taking or leaving sections as you choose; but I recommen 
reading all of Chapter 3 (except perhaps the Projects, see below). 
3. A number of sections are designated as Projects. These are to be regarded as 
less central and possibly more difficult than the others. The word sugges s ! = 
they could be worked through by individuals outside class time, and then made the 
subject of presentations to the class. l 
4. Each chapter after this one begins with a box containing ‘topics, techniques, 
al orithms and cross-references’. This is designed to give you some indication o e 
scope of the chapter. Roughly speaking, topics are specific resulte or constructions; 
i i icability, indicating general methods which may be 
techniques are of wider applicability, indica ay be 
illustrated in specific cases in the chapter; algorithms are self-explanatory; an 
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cross-references pinpoint at least some occurrences of the material in other chapters. 
These are usually backward references, but the multidimensional nature of the 
subject means that this is not always so. You should use these as pointers to places 
where you might find help if you are stuck on something. The index can also be 
used for this purpose. 


5. The exercises are a mixed bunch; but, by and large, I have tended to avoid 
‘drill’ and give more substantial problems. You will certainly learn more if you work 
conscientiously through them. But I have tried not to assume that you have done all 
the problems. When (as often happens) the result of an exercise is needed in a later 
chapter, I have usually supplied a proof (or, failing that, a hint). Indeed, hints are 
strewn liberally through the exercises, and some example solutions are given (rather 
more briefly than I would expect from students!) at the end of the book. 


6. The last chapter does two jobs. First, it treats (somewhat sketchily) some further 
topics not mentioned earlier; second, it gives pointers to further reading in various 
parts of combinatorics. I have included a small collection of unsolved problems 
here, to indicate the sort of thing that research in combinatorics might involve. But 
beware: these problems are unsolved; this means that somebody has given some 
thought to them and failed to solve them, so they are probably more difficult than 
the exercises in other chapters. 


T. The numbering is as follows. Chapter A is divided into sections, of which a 
typical one is Section A.B. Within a section, theorems (and similar statements such 
as propositions, lemmas, corollaries, facts, algorithms, and numbered equations) 
have numbers of the form A.B.C. On the other hand, diagrams are just numbered 
within the chapter, as A.D, for example; and exercises are typically referred to 
as ‘exercise E of Chapter A’. Some theorems or facts are displayed in a box for 
easy teference. But don’t read too much into the difference between displayed and 
undisplayed theorems, or between theorems and propositions; it’s a matter of taste, 
and consistency is not really possible. 


8. An important part of combinatorics today is the algorithmic side: I can prove that 
some object exists; how do I construct it? I have described algorithms for a wide 
range of constructions. No knowledge of computers or programming languages is 
assumed. The description of the algorithms makes use of words like ‘While ...’, 
‘Repeat ... until ...’, and so on. These are to be interpreted as having their usual 
English meaning. Of course, this meaning has been taken over by programming 
languages; if you are fluent in Pascal, you will I hope find my descriptions quite 
congenial. If you are a competent programmer and have access to a computer, you 
are advised at several places to implement these algorithms. 


What you need to know 


The mathematical results that I use are listed here. You don’t need everything all 
at once; the more advanced parts of algebra, for example, are only required later 
in the book, so you could study algebra and combinatorics at the same time. If 
all else fails, I have tried to arrange things so that you can take on trust what you 
don’t know. Topics in square brackets are treated in the book, but you may feel the 
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need of more explanation from a course or textbook in that subject. As you see, 
combinatorics connects with all of mathematics; you will see material from many 


other areas being used here. ; 
e Basic pure mathematics: Sets and functions, ordered n-tuples and cartesian 


products; integers, factorisation, modular arithmetic; [equivalence and order 


relations]. l , 
o Linear algebra: Vector spaces, subspaces; linear transformations, matrices; row 


operations, row space; eigenvalues of real symmetric matrices. 
e Abstract algebra: [Elementary group theory; finite fields]. 
e Number theory: [Quadratic residues; two and four squares theorems). 
e Analysis: Basic operations (limits, differentiation, etc.); [power series]. 
e Topology: [Definition of metric and topological space; surfaces; Jordan curve 


theorem]. 
e Probability: Basic concepts (for finite spaces only) [except in Chapter 19]. 


o Set theory: See Chapter 19. 


Exercises 
1. For n = 3,4,5, calculate the number of ways of putting n letters into their 


envelopes so that every letter is incorrectly addressed. Calculate the ratio of this 
number to n! in each case. 

2. Solve Kirkman’s problem for nine schoolgirls, walking for four days. 

3. Solve Euler’s problem for nine, sixteen and twenty-five officers. Show that no 
solution is possible for four officers. 


4. Test the assertion that the Ramsey game cannot end in a draw by playing it with 
a friend. Try to develop heuristic rules for successful play. 


4 As will be explained in Section 4.2, our treatment of power series is formal and does not involve 
questions of convergence. 


2. On numbers and counting 


One of them is all alone and ever more shall be so 

Two of them are lily-white boys all clothed all in green Oh 

Three of them are strangers o'er the wide world they are rangers 
Four it is the Dilly Hour when blooms the Gilly Flower 

Five it is the Dilly Bird that's seldom seen but heard 

Six it is the ferryman in the boat that o'er the River floats Oh 

Seven are the Seven Stars in the Sky, the Shining Stars be Seven Oh 
Eight it is the Morning's break when all the World’s awake Oh 

Nine it is the pale Moonshine, the Shining Moon is Nine Oh 

Ten Forgives all kinds of Sin, from Ten begin again Oh 


English traditional folksong 
from Bob Stewart, Where is Saint George? (1977) 


Torics: Natural numbers and their representation; induction; use- 
ful functions; rates of growth; counting labelled and unlabelled 
structures; Handshaking Lemma 


TECHNIQUES: Induction; double counting 
ALGORITHMS: Odometer Principle; [Russian peasant multiplication] 


CROSS-REFERENCES: 


This chapter is about counting. In some sense, it is crucial to what follows, since 
counting is so basic in combinatorics. But this material is part of mathematical 
culture, so you will probably have seen most of it before. 


2.1. Natural numbers and arithmetic 


Kronecker is often quoted as saying about mathematics, ‘God made the integers; 
the rest is the work of man.’ He was referring to the natural numbers (or counting 
numbers), which are older than the earliest archeological evidence. (Zero and the 
negative numbers are much more recent, having been invented (or discovered) in 
historical time.)' Since much of combinatorics is concerned with counting, the 
natural numbers have special significance for us. 


1 See Georges Ifrah, From One to Zero: A Universal History of Numbers (1985), for an account of 
the development of numbers and their representation. 


2. On numbers and counting 


As each new class of numbers was added to the mathematical repertoire, it was 
given a name reflecting the prejudice against its members, or the ‘old’ numbers were 
given a friendly, reassuring name. Thus, zero and negative integers are contrasted 
with the ‘natural’ positive integers. Later, quotients of integers were ‘tational’, as 
opposed to the ‘irrational’ square root of 2; and later still, all numbers tational and 
irrational were regarded as ‘real’, while the square root of —1 was “imaginary” (and 
its friends were ‘complex’. 


The natural numbers are the first mathematical construct with which we become 
familiar. Small children recite the names of the first few natural numbers in the same 
way that they might chant a nursery rhyme or playground jingle. This gives them 
the concept that the numbers come in a sequence. They grasp this in a sophisticated 
way. The rhyme” 


One, two, 
Missed a few, 
Ninety-nine, 


A hundred 


expresses confidence that the sequence of numbers stretches at least up to 100, and 
that the speaker could fill in the gap if pressed. 


Order or progression is thus the most basic property of the natural numbers.* 
How is this expressed mathematically? First we must stop to consider how natural 
numbers are represented. The simplest way to represent the number r is by a 
sequence of n identical marks. This is probably the earliest scheme mankind 
adopted. It is well adapted for tallying: to move from one number to the next, 
simply add one more mark. However, large numbers are not easily recognisable. 
After various refinements (ranging from grouping the marks in sets of five to the 
complexities of Roman numerals), positional notation was finally adopted. 

This involves the choice of a base b (an integer greater than 1), and b digits (dis- 
tinguishable symbols for the integers 0,1,2,..-, b— 1). (Early attempts at positional 
notation were bedevilled because the need for a symbol for zero was not recognised.) 
Now any natural number N is represented by a finite string of digits. Logically the 
string is read from right to left; so we write it as Z__1..-212%0, Where each z; is 
one of our digits. By convention, the leftmost digit is never zero. The algorithm for 
advancing to the next number is called the Odometer Principle. It is based on the 
principle of trading in b counters in place 7 for a single counter in place t+ 1, and 
should be readily understood by anyone who has watched the odometer (or mileage 
gauge) of a car. 


2 T have heard the feminist version of this: ‘One, two, Mrs. Few, ...’ 


3 The operations of arithmetic are based on the tacit assumption that we can always pass from any 
number to its successor, and this is the essence of the ordinal concept. Tobias Dantzig, Number: the 


Language of Science (1930). 


2.1. Natural numbers and arithmetic 


(2.1.1) Odometer Principle 
to find the successor of a natural number to base b 
Start by considering the rightmost digit. 

e If the digit we are considering is not b — 1, then replace it by 
the next digit in order, and terminate the algorithm. 

o If we are considering a blank space (to the left of all the digits), 
then write in it the digit 1, and terminate the algorithm. 

o If neither of the above holds, we are considering the digit b — 1. 
Replace it with the digit 0, move one place left, and return to 
the first bullet point. 


For example, if the base b is 2 and the digits are 0 and 1, the algorithm (starting 
with 1) generates successively 10, 11, 100, 101, 110, ... . 
Now it can be proved by induction that the string z,_1...21Z0 represents the 
positive integer 
tad | tee tab + zo 
(see Exercise 2). 


Often the number 0 is included as a natural number. (This is most usually done 
by logicians, who like to generate the whole number system out of zero, or nothing. 
But it conflicts with our childhood experience: I have never heard a child say 
‘nought, one, two, ...1, and we don’t count that way.) This is done by modifying 
our representation so that the digit 0 represents the number 0. This is the one 
allowed exception to the rule that the left-most digit cannot be 0; the alternative, 
representing 0 by a blank space, would be confusing. 


The odometer of a car actually works slightly differently. It works with a fixed 
number of digits which are initially all zero, so that the ‘blank space’ case of the 
algorithm cannot arise. If there are k digits, then the integers 0,...,6* — 1 are 
generated in turn, and then the odometer returns to 0 and the process repeats. 


Now that we have a representation of positive integers, and understand how to 
move to the next integer, we should explore the arithmetic operations (ambition, 
distraction, uglification and derision). Algorithms for these are taught in primary 
school I will not consider the details here. It is a good exercise to program 
a computer to perform these algorithms’, or to investigate how many elementary 


4 A possible exception occurs when one child has been appointed to be first, and another wishes to 
claim precedence, as in ‘Zero the hero’. But this is closer to the historical than the logical approach. 
5 Lewis Carroll, Alice’s Adventures in Wonderland (1865). 

ê These algorithms were known to the Babylonians in 1700 B.C. 

? Most programming languages specify the ‘maximum integer’ to be something like 32767 or 
2147483647. Often, the answer to a counting problem will be much larger than this. To find it by 
computer, you may have to write routines for arithmetic operations on integers with many digits, If 
you need to do this, write your routines so that you can re-use them! 
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operations are required to add or multiply two n-digit numbers (where elementary 
operations might consist of referring to one’s memory of the multiplication tables, 


or writing down a digit). 


2.2. Induction 


Induction is a very powerful principle for proving assertions about the natural 
numbers. It is applied in various different forms, some of which are described in 
this section. We also see that it is a consequence of our most basic intuition about 
the natural numbers. 

The Principle of Induction asserts the following: 


(2.2.1) Principle of Induction 
Let P(n) be a proposition or assertion about the natural number n. 
Suppose that P(1) is true. Suppose also that, if P(n) is true, then 
P(n +1) és also true. Then P(n) is true for all natural numbers n. 


Why is this true? As we saw, the basic property of the natural numbers, 
recognised even by children, is that we can count up to any natural number n 
starting from 1 (given sufficient patience!) Now, with the assumptions of the 
Principle, P(1) is true, so P(2) is true, so (miss a few here) so P(n — 1) is true, so 
P(n) is true. 

As this argument suggests, if you are reading a mathematical argument, and the 
author puts in a few dots or the words ‘and so on’, there is probably a proof by 
induction hiding there. Consider, for example, the function f satisfying f(1) = 2 
and f(n +1) = 2f(n) for all natural numbers n. Then 


f(2) =4=2, f(3) =8=2%, ... F(n) = 2". 


The dots hide a proof by induction. Let P(n) be the assertion that f(n) = 2”. Then 
P(1) holds; and, assuming that P(n) holds, we have 


P(n+1) =2P(n) = 2-2" = 2"), 


so P(n + 1) also holds. So the Principle of Induction justifies the conclusion. The 
point is that very simple arguments by induction can be written out with three 
dots in place of the detailed verification, but this verification could be supplied if 
necessary. We'll see more examples of this later. 


Now I give some alternative forms of the Principle of Induction and justify their 
equivalence, The first one is transparent. Suppose that P(n) is an assertion, for 
which we know that P(27) is true, and that if P(n) holds then so does P(n + 1). 
Then we conclude that P(n) holds for all n > 27. (To prove this formally, let Q(n) 
be the assertion that P(n + 26) is true, and verify the hypotheses of the Principle of 
Induction for Q(n).) 

For the next variation, let P(n) be a proposition about natural numbers. Suppose 
that, for every natural number n, if P(m) holds for all natural numbers m less than 
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n, then P(n) holds. Can we conclude that P(n) holds for all n? On the face of it, 
this seems a much stronger principle, since the hypothesis is much weaker. (Instead 
of having to prove P(n) from just the information that P(n — 1) holds, we may 
assume the truth of P(m) for all smaller m.) But it is true, and it follows from the 
Principle as previously stated. 

We let Q(n) be the statement ‘P(m) holds for all m < n’. Now it is clear that 
Q(n +1) implies P(n), so we will have succeeded if we can prove that Q(n) holds 
for all n. We prove this by induction. 

First, Q(1) holds: for there are no natural numbers less than 1, so the assertion 
P holds for all of them (vacuously). 

Now suppose that Q(n) holds. That is, P(m) holds for all m < n. By assumption, 
P(n) also holds. Now P(m) holds for all m < n + 1 (since the numbers less than 
n +1 are just n and the numbers less than n)”. In other words, Q(n + 1) holds. 

Now the Principle of Induction shows that Q(n) holds for all n. 

The final re-formulation gives us the technique of ‘Proof by Minimal Counterex- 
ample’. Suppose that P(n) is a proposition such that it is not true that P(n) holds 
for all natural numbers n. Then there is a least natural number n for which P(n) is 
false; in other words, P(m) is true for all m < n but P(n) is false. For suppose that 
no such n exists; then the truth of P(m) for all m < n entails the truth of P(n), 
and as we have seen, this suffices to show that P(n) is true for all n, contrary to 
assumption. 

This argument shows that any non-empty set of natural numbers contains a 
minimal element. (If S is the set, let P(n) be the assertion n ¢ S.) 


2.3. Some useful functions 


Tassume that you are familiar with common functions like polynomials, the function 
|z] (the absolute value or modulus), etc. 


Floor and Ceiling. The floor of a real number x, written |z], is the greatest integer 
not exceeding z. In other words, |z] is the integer m such that m < z < m +1. If z 
is an integer, then |x| = x. This function is sometimes written [zx]; but the notation 
|x] suggests ‘rounding down’. It is the number of the floor of a building on which 
x would be found, if the height of z above the ground is measured in units of the 
distance between floors. (The British system of floor numbering is used, so that the 
ground floor is number 0.) 

The ceiling is as you would probably expect: [x] is the smallest integer not less 
than z. So, if x is not an integer, then [x] = |z] + 1; if x is an integer, its floor and 
ceiling are equal. In any case, you can check that 


fz] =—|-2]. 


Factorial. The factorial function is defined on positive integers by the rule that n! is 
the product of all the integers from 1 to n inclusive. It satisfies the condition 


nli=n-(n—1)! (*) 


3 Let p be an integer less than n + 1. Then p< n or p= n or p > n; and the last case is impossible, 
since there is no integer between n and n + 1. 
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for n > 1. In fact, we can consistently define 0! = 1; then (*) holds for all n > 0. 
In fact, the conditions 0! = 1 and (+) actually define n! for all natural numbers n. 
(This is proved by induction: 0! is defined; if n! is defined then so is {n + 1)!; so n! 
is defined for all n.) 


Exponential and logarithm. These two functions are familiar from elementary 
calculus. We will often use the power series expansions of them. The equation 


is valid for all real numbers z. On the other hand, the function logz can’t be 
expanded as a series of powers of z, since log Q is undefined. Instead, we have 
oo - 2 
(—1) 1z" z 
log(1 =) ——— srt- H., 
og(1 + 2) > 7 2-5 + 
which is valid for all « with |x| < 1 (and in fact also for x = 1). 
The exponential function grows more rapidly than any power of z; this means 
that e* > 2° for all sufficiently large x (depending on c). In fact, for z > (c+ 1)!, we 
have 


ctl 
x & c 


> grj?” 


On the other hand, the logarithm function grows more slowly than any power of z. 
We will often write exp(z) instead of e”. 


2.4. Orders of magnitude 


People use the phrase ‘the combinatorial explosion’ to describe a counting function 
which grows very rapidly. This is a common phenomenon, and it means that, while 
we may be able to give a complete description of all the objects being counted 
for small values of the parameter, soon there will be far too many for this to be 
possible, and maybe even far too many for an exact count; we may have to make do 
with fairly rough estimates for the counting function. I will consider now what such 
rough estimates might look like. In this section, some results from later chapters will 
be anticipated. If you are unfamiliar with these, take them on trust until we meet 
them formally. 

Let X be a set with n elements, say X = {1,2,...,n}. The number of subsets 


of X is 2”. This is the most familiar example of an exponential function, or function 
with exponential growth. A function f which has (precisely) exponential growth has 


the property that 
f(n + 1) = ef(n) 


for some c > 1. (If c = 1, the function is constant; if e < 1, then f(n) — 0 as n — o0. 
In these cases, the term ‘exponential growth’ is not really appropriate!®} A function 


° Economists define a recession as a period when the exponential constant for the GDP is less than 
1.004. Sometimes you have to run in order to stand still. 
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f satisfying the above equation is given by f(n) = ac", where a is a constant (and 
is equal to the value of f(0)). 

We also say that a function f has ‘exponential growth’ if it is roughly the same 
size as an exponential function. So the function f(n) = 2" +n has exponential 
growth, since the term n is dwarfed by 2" for large n. Formally, the function f is 
said to have exponential growth if f(n)/" tends to a limit c > 1 as n => oo. This 
means that, for any positive number e€, f(n) lies between (c — €)” and (c +)” for all 
sufficiently large n. The number c is called the exponential constant for f. 


Of course, a function may grow more slowly than exponentially. Examples 
include 
o polynomial growth with degree c, like the function f(n) = n°; 
o fractional exponential growth with exponent c, like the function e°, where 0 < 
e<l. 
These functions arise in real combinatorial counting problems, as we will see. 
But many functions grow faster than exponentially. Here are two examples. 


The number of permutations of the set X is equal to n! = n(n — 1)...1, the 
product of the integers from 1 to n inclusive, We have 


gro < n! < n”! 


3 


because (ignoring the factor 1) there are n — 1 factors, each lying between 2 and 
n. In fact it is easy to see that the growth is not exponential, We will find better 
estimates in the next chapter. 


Now let P(X), the power set of X, denote the set of all subsets of X. We will 
be considering subsets of P(X), under the name femilies of sets. How many families 
of sets are there? Clearly the number is 2?”. This number grows much faster than 
exponentially, and much faster than the factorial function. A function like this is 
called a double exponential. 


For comparing the magnitudes of functions like these, it is often helpful to 
consider the logarithm of the function, rather than the function itself. The logarithm 
of an exponential function is a (roughly) linear function. The logarithm of n! is 
fairly well approximated by nlogn; and the logarithm of a double exponential is 
exponential. Other possibilities are functions whose logarithms are polynomial. 


Of course, this is only the beginning of a hierarchy of growth rates; but for the 
most part we won’t have to consider anything worse than a double exponential. 


In connection with growth rates, there is a convenient analytic notation, We 
write O( f(n)) (read ‘big Oh f(n)’) to mean a (possibly unknown) function g(n) such 
that, for all sufficiently large n, |g(n)| < cf(n) for some constant c. This is typically 


used in the form 
$(n) = F(n) + O(f(n)), 


where ¢ is a combinatorial counting function and F, f are analytic functions where 
f grows more slowly than F; this has the interpretation that the order of magnitude 
of ¢ is similar to that of F. For example, in Section 3.6, we show that 


logn! = nlogn — n + O(logr). 
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We write o(f(n}) (and say ‘small oh f(n)’) to man a function g(n) such that 
g(n)/ f(n) + 0 as n — 00; that is, g is of smaller order of magnitude than f. 

There are several variants. For example, 2 is the opposite of O; that is, Q(f(n)) 
is a function g(n) with |g(n)| > cf(n) for some constant c > 0. Also, g(n) ~ f(n) 
means that both g(n) = O(f(n)) and g(n) = Q(f(n)) hold: roughly, f and g have 
the same order of magnitude apart from a constant factor. 


2.5. Different ways of counting 


In combinatorics (unlike real life"), when we are asked to count something, there 
are very many different answers which can be regarded as correct. Consider the 
simple problem of choosing three items from a set of five. Before we can work out 
the right answer, the problem must be specified more precisely. Are the objects in 
the set identical {five electrons, say, or five red billiard balls), or all different (the 
ace, two, three, four, and five of spades, for example)? Does the order of selection 
matter? (That is, do we just put in a hand and pull three objects out, or do we draw 
them one at a time and record the order?) And are we allowed to choose the same 
object more than once (say, by recording the result of each draw and returning the 
object to the urn), or not? There are various intermediate cases, like making words 
using the letters of a given word, where a letter may be repeated but not more often 
than it occurs in the original word. 


Almost always, we assume that the objects are distinguishable, like the five 
spade cards. Under this assumption, the problem will be solved under the four 
possible combinations of the other assumptions in Chapter 3. What if they are 
indistinguishable? In this case, there is obviously only one way to select three red 
billiard balls from a set of five: any three red billiard balls are identical to any other 
three. 


t 


What difference does indistinguishability make? If the underlying objects are 
distinguishable, we can assume that they carry labels bearing the numbers 1,2,..., n. 
In this case, we say that the configurations we are counting are labelled. If the n 
underlying objects are indistinguishable, we are counting unlabelled things. An 
example will illustrate the difference. 


Suppose that we are interested in n towns; some pairs of towns are joined by 
a direct road, others not. We are not concerned with the geographical locations, 
only in whether the towns are connected or not. (This is described by the structure 
known as a graph! See Chapter 11 for more about graphs.) Figure 1 shows the 
eight labelled graphs for n = 3. If the towns are indistinguishable, then the second, 


10 According to folklore, it is impossible to count the Rollright Stones consistently. 


11 This usage of the term is quite different from the sense in the phrase ‘the graph of y = sin 2’. 
Some people distinguish the two meanings by different pronunciation, with a short a for the sense 
used here. 
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Fig. 2.1. Graphs on three vertices 


third and fifth graphs are identical, as are the fourth, sixth and seventh. So there 
are just four unlabelled graphs with n = 3. 


In general, let f(n) and g(n) denote the numbers of labelled and unlabelled 
configurations, respectively, with n underlying objects. Then two labelled configura- 
tions will be regarded as identical as unlabelled configurations if and only if there 
is a permutation of {1,2,...,n} which carries one to the other. (For example, the 
cyclic permutation 1 ++ 21+ 3 — 1 carries the second graph in Fig. 1 to the fifth.) 
So at most n! labelled configurations collapse into a single unlabelled one, and we 


have 
f(n)f/nl < g(n) < f(n). 


Now there are two possibilities for the ‘order of magnitude’ behaviour. 

If f(n) grows much more rapidly than n!, then the left and right hand sides 
of this equation are not so very far apart, and we have a reasonable estimate for 
g(n). For example, we saw that there are 2?" families of subsets of the n-element set 
X. The number of permutations is insignificant by comparison, so it doesn’t matter 
very much whether the elements of X are distinguishable or not, that is, whether we 
count labelled or unlabelled families. 

But if this doesn’t occur, then more care is needed. There are just 2” subsets of 
the n-element set X, and this function grows more slowly than n!. In this case, we 
can count unlabelled sets another way. If all elements of X are indistinguishable, 
then the only thing we can tell about a subset of X is its cardinality; two subsets 
containing the same number of elements are equivalent under a permutation. So the 
number of unlabelled subsets is n + 1, since the cardinality of a subset can take any 
one of the n + 1 values 0,1,2,...,7. 


This theme can be refined, using the concepts of permutation group and cycle 
index, These are more advanced topics, and will be treated in Part 2 (see Chapter 15). 
2.6. Double counting 


We come now to a deceptively simple but enormously important counting principle: 


If the same set is counted in two different ways, the answers are the 
same. 


This is analogous to finding the sum of all the entries in a matrix by adding the row 
totals, and then checking the calculation by adding the column totals. 

The principle is best illustrated by applications (of which there will be many 
later) — here is one: 
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(2.6.1) Handshaking Lemma 
At a convention, the number of delegates who shake hands an odd 
number of times is even. 


To show this, let D,,...,D,, be the delegates. We apply double counting to the 
set of ordered pairs (D;,D;) for which D; and D; shake hands with each other at 
the convention. Let x; be the number of times that D; shakes hands, and y the 
total number of handshakes that occur. On the one hand, the number of pairs is 
Dim 24, since for each D; the number of choices of D; is equal to z;. On the other 
and, each handshake gives rise to two pairs (D;, D;) and (D;, D;); so the total is 

y. Thus 7 


But, if the sum of n numbers is even, then evenly many of the numbers are odd. 
(If we add an odd number of odd numbers and any number of even numbers, the 
answer will be odd.) 


The double counting principle is usually applied to counting ordered pairs. 
For lovers of formalism, here is a general result, which encapsulates most of the 
applications we will make of it. 


(2.6.2) Proposition. Let A = {01,..., am} and B = {b1,...,bn} be sets. Let S bea 
subset of Ax B. Suppose that, fori = 1,...,m, the element a; is the first component 
of z; pairs in S, while, for j = 1,...,n, the element b; is the second component of 
y; pairs in S. Then 


|S] = Sai = Vow. 
i=l 


= 


Often it happens that x; is constant (say z) and y; is also constant (say y). Then 
we have 
Mx = ny. 


2.7. Appendix on set notation 


The basic notation for sets is listed here. If A and B are sets, then we write z € A 
if z is an element of A, x ¢ A otherwise. Also 

|A| (the cardinality of A) is the number of elements in A; 

AU B (the anion) is the set of elements in A or B (or both); 

AN B (the intersection) is the set of elements in both A and B; 

A\ B (the difference) is the set of elements in A but not B; 

AAB (the symmetric difference) is the set of elements in just one of the two sets; 

AC B if every element of A belongs to B, 

A = B if A and B have exactly the same elements. 
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So, for example, 
AAB = (A\ B)U(B\ A) =(AUB)\ (ANB), 
|AUB|+|ANB|=|A|+[B}. 


The notation {x : P} means the set of all elements x having property P. So, for 
example, 

AUB={z:2€Aorze Bh. 

Similarly, {x,y} is the set consisting of the elements z and y only. It is sometimes 
called an unordered pair, since {x,y} = {y, £}. By contrast, the ordered pair (x,y) 
has the property that (x,y) = (u,v) if and only if z = u and y = v. This is familiar 
from Cartesian coordinates of points in the Euclidean plane. 

The Cartesian product A x B is the set of all ordered pairs (a,b), with a € A 
and b € B. Similarly for more than two factors. For example, we write A” for the 
set of ordered n-tuples of elements of A, for any positive integer n. We have 


|A x B| = |A|- |Bl, 
[A"| = JAJ". 


Until last century, a function was something described by a formula (typically a 
polynomial or a power series); it was the ambiguity in this definition which led to 
the modern version. A function f from A to B is a subset of A x B with the property 
that, for any a € A, there is a unique b € B such that (a,b) € f. If (a,b) € f, we 
write f(a) = b}? Usually there is a rule for calculating 6 = f(a) from a, but this is 
not part of the definition. 

If A = {a,G2,...,@n}, then any function f : A B can be specified by giving 
the n-tuple of values (f(a:), f(a2),-..,f(@n)). Thus the number of functions from 
A to B is |BI!4!. Motivated by this, the set of functions from A to B is sometimes 
written B4, so that |B4| = |B]. 

The power set P(A) is the set of all subsets of A. Any subset X of A is specified 
by its characteristic function, the function fx : A —> {0,1} defined by 


1 ifae X; 
fx(a) = t fad xX 
(Two subsets are equal if and only if their characteristic functions are equal.) So 


there are as many subsets of A as there are functions from A to {0,1}; that is, 
|P(A)| = 24, 


2.8: Exercises 


1. Criticise the following proof that 1 is the largest natural number. 


Let n be the largest natural number, and suppose than n £ 1. Then 
n > 1, and so n? > n; thus n is not the largest natural number. 


12 This definition is very familiar, despite appearances. You probably visualise ‘the function y = z? in 
terms of its graph in the Euclidean plane with coordinates (x, y); and the graph consists of precisely 
those ordered pairs (x, y) for which y = 2”. In other words, the graph is the function! 
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2. Prove by induction that the Odometer Principle with base b does indeed give the 
representation t,_;...2,2o for the natural number 


N = tnb? +--+ + ab + zp. 


go) 


for n > 1. (You may use the fact that (1 + 1)" < e for all n.) 
(b) Use the arithmetic-geometric mean inequality! to show that n! < (24) for 


n > 1, and deduce that 
1 oj 
nice €¢ 
forn > 1, 


4. (a) Prove that log z grows more slowly than 2° for any positive number c. 

(b) Prove that, for any c,d > 1, we have c* > zf for all sufficiently large x. 
5. (a) We saw that there are 2” = 256 labelled families of subsets of a 3-set, How 
many unlabelled families are there? 

(b) Prove that the number F(n) of unlabelled families of subsets of an n-set 
satisfies log, F(n) = 2" + O(nlogn). 


6. Verify that the numbers of graphs are given in Table 1 for n <5. 


3. (a) Prove by induction that 


n23 4 5 
labelled 2 8 64 1024 
unlabelled 2 4 11 34 


Table 2.1. Graphs 


7. Suppose that an urn contains four balls with different colours. In how many 
ways can three balls be chosen? As in the text, we may be interested in the order 
of choice, or not; and we may return balls to the urn, allowing repetitions, or not. 
Verify the results of Table 2. 


order order 
important unimportant 


repetition 
allowed 64 20 


repetition 
not allowed 24 4 


Table 2.2. Selections 


8. A Boolean function takes n arguments, each of which can have the value TRUE 
or FALSE. The function takes the value TRUE or FALSE for each choice of values of 
its arguments. Prove that there are 2?” different Boolean functions. Why is this the 
same as the number of families of sets? 


13 The arithmetic_geometric mean inequality states that the arithmetic mean of a list of positive 
numbers is greater than or equal to their geometric mean, with equality only if all the numbers are 
equal. Can you prove it? (Hint: Do the special case when all but one of the numbers are equal by 
calculus, and then the general case by induction.) 
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9. Logicians define a natural number to be the set of all its predecessors: so 3 is the 
set {0,1,2}. Why do they have to start counting at 0? 

10. A function f has polynomial growth of degree dif there exist positive real numbers 
a and b such that an? < f(n) < bn? for all sufficiently large n. Suppose that f has 
polynomial growth, and g has exponential growth with exponential constant greater 
than 1 (as defined in the text). Prove that f(n) < g(n) for all sufficiently large n. If 
f(r) = 108n! and g(n) = (1.000001)", how large is ‘sufficiently large’? 

11. Let B be a set of subsets of the set {1,2,...,v}, containing exactly b sets. 


Suppose that 
è every set in B contains exactly k elements; 
e fori =1,2,...,v, the element 7 is contained in exactly r members of B. 


Prove that bk = vr. 

Give an example of such a system, with v = 6, k = 3,5 = 4, r = 2. 
12. The ‘Russian peasant algorithm’ for multiplying two natural numbers m and n 
works as follows.'4 


(2.7.3) Russian peasant multiplication 
to multiply two natural numbers m and n 
Write m and n at the head of two columns. 


REPEAT the sequence l 
e halve the last number in the first column (discarding the re- 


mainder} and write it under this number; Sa 
e double the last number in the second column and write it under 


this number; 
UNTIL the last number in the first column is 1. 

For each even number in the first column, delete the adjacent 
entry in the second column. Now add the remaining numbers in 


the second column. Their sum is the answer. 


For example, to calculate 18 x 37: 


18 38 


Table 2.3. Multiplication 


PROBLEMS. (i) Prove that this method gives the right answer. 


14 No tables needed, except two times! 
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(ii) What is the connection with the primary school method of long multiplication? 
HINT For (i) anD (ii): Express m (and n) to the base 2. 

(iii) Suppose we change the algorithm by squaring (instead of doubling) the numbers 
in the second column, and, in the last step, multiplying (rather than adding) 
the undeleted numbers. Prove that the number calculated is n™. How many 
multiplications does this method require? 


13. According to the Buddha, 


Scholars speak in sixteen ways of the state of the soul after death. 
They say that it has form or is formless; has and has not form, 
or neither has nor has not form; it is finite or infinite; or both or 
neither; it has one mode of consciousness or several; has limited 
consciousness or infinite; is happy or miserable; or both or neither. 


How many different possible descriptions of the state of the soul after death do you 
recognise here? 


14, The library of Babel’? consists of interconecting hexagonal rooms. Each room 
contains twenty shelves, with thirty-five books of uniform format on each shelf. 
A book has four hundred and ten pages, with forty lines to a page, and eighty 
characters on a line, taken from an alphabet of twenty-five orthographical symbols 
(twenty-two letters, comma, period and space). Assuming that one copy of every 
possible book is kept in the library, how many rooms are there? 


15. COMPUTER PROJECT. Develop a suite of subroutines for performing arithmetic on 
integers of arbitrary size, regarded as strings of digits. (You should deal with input 
and output, arithmetic operations — note that division should return a quotient 
and a remainder — and comparisons. You might continue with exponentiation and 
factorials, as well as various combinatorial functions to be defined later.) 


15 Jorge Luis Borges, Labyrinths (1964), 


3. Subsets, partitions, permutations 


The emphasis on mathematical methods seems to be shifted more towards 
combinatorics and set theory — and away from the algorithm of differential 
equations which dominates mathematical physics. 


J. von Neumann & O. Morganstern, 
Theory of Games and Economic Behaviour (1944). 


The process is directed always towards analysing and sepa rating the material 
into a collection of discrete counters, with which the detached intellect can 
make, observe and enjoy a series of abstract, detailed, artificial patterns of 
words and images (you may be reminded of the New Criticism). .. 


Elizabeth Sewell, ‘Lewis Carroll and T. S. Eliot as Nonsense Poets’ 
in Neville Braybrooke (ed.), T. 5. Eliot (1958). 


ToPics: Subsets, binomial coefficients, Pascal’s Triangle, Binomial 
Theorem; [congruences of binomial coefficients]; permutations, or- 
dered and unordered selections, cycle decomposition of a permuta- 
tion; estimates for the factorial function; relations; [finite topolo- 
gies; counting trees]; partitions, Bell numbers 


TECHNIQUES: Binomial coefficient identities; use of double counting; 
estimates via integration 

ALGORITHMS: Sequential and recursive generation of combinatorial 
objects 


CROSS-REFERENCES: Odometer Principle; double counting (Chap- 
ter 2); recurrence relations (Chapter 4) 


This chapter is about the central topic of ‘classical’ combinatorics, what is often 
referred to as ‘Permutations and Combinations’. Given a set with n elements, how 
many ways can we choose a selection of its elements, with or without respect to the 
order of selection, or divide it up into subsets? We'll define the various numbers 
involved, and prove some of their properties; but these echo through subsequent 


chapters. 
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3.1. Subsets 


How many subsets does a set of n elements have? 


The number of subsets is 2". There are several different ways to see this. Perhaps 
most easily, for each of the n elements of the set, there are two choices in building 
a subset (viz., put the element in, or leave it out); all combinations of these choices 
are possible, giving a total of 2". 

Implicitly, this argument sets up a bijection between the subsets of a set X and 
the functions from X to {0,1}. The function fy corresponding to the subset Y is 
defined by the rule 

1 ifseY 
fr(2) = f fa ZY. 
Conversely, a function f corresponds to the set Y = {x € X : f(z) = 1}. The 
function fy is called the characteristic function or indicator function of Y. 

If X = {0,1,...,2— 1}, then we can represent a function f : X — {0,1} by 
the n-tuple (f(0), f(1),..., f(n — 1)) of its values. Thus subsets of X correspond to 
n-tuples of zeros and ones. 

We can take this one step further, and regard the n-tuple as the base 2 repre- 
sentation of an integer 


N = f(n—1)2"71 +... 4 f(1)2 + £(0), 


as described in Chapter 2. Each n-tuple corresponds to a unique integer; the smallest 
is 0 (corresponding to the empty set), and the largest is 2°" +...+2+1=2"-1 
(corresponding to the whole set X), and every integer between represents a unique 
subset, So the number of subsets is equal to the number of integers between 0 and 
2” — 1 {inclusive), namely 2”. 

Note that this method gives a convenient numbering of the subsets of the set 
{0,...,n—1}: the k subset X, corresponds to the integer k, where 0 < k < 2—1. 
The set X, is easily recovered by writing k to base 2. The numbering has some 
further virtues. For example, the set X, depends only on k, and not on the particular 
value of n used; replacing n by a larger value doesn’t change it. So we get a unique 
set X, of non-negative integers corresponding to each non-negative integer k. For 
another nice property, see Exercise 2. 


Yet another proof of the formula for the number F(n) of subsets of an n-set is 
obtained by noting that we can find all subsets of {1,..., n +1} by taking all subsets 
of {l,...,n} and extending each in the two possible ways — either do nothing, 
or add the element n+ 1. So F(n +1) = 2F(n). This is a recurrence relation, by 
which the value of F is determined by its values on smaller arguments, Recurrence 
relations form the subject of the next chapter. 


3.2. Subsets of fixed size 


Let n and k be non-negative integers, with 0 < k < n. The binomial coefficient (2) is 
defined to be the number of k-element subsets of a set of n elements. (The number 
obviously doesn’t depend on which n-element set we use.) This number is often 
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written as "Cp, and is read ‘n choose k’. It is called a binomial coefficient (for reasons 
to be elaborated later). 


(3.2.1) Formula for binomial coefficients 


(7) = Mae n! 
kK} R(R=1)..1 Rn — 


Note that (3) = 1 (the empty set) and () = ] (the whole set) — the proposed 
formula is correct in these cases, in view of the convention that 0! = 1 (see 
Section 2.3). . 

As suggested by the name, we prove this by counting choices. Given a set X of 
n elements, in how many ways can we choose a set of k of them? Clearly there are 
n possible choices for the ‘first’ element, (n — 1) choices for the ‘second’, ... , and 
(n—k+1) choices for the ‘k"”; in total, n(n —1)...(n—k +1). But we put the terms 
‘first’, ‘second’, etc., in quotes because a subset has no distinguished first, second, ... 
element. In other words, if the same k elements were chosen in a different order, the 
same subset would result. So we must divide this number by the number of orders 
in which the & elements could have been chosen. Arguing exactly as before, there 
are k choices for which one is ‘first’, (k— 1) for which is ‘second’, and so on. Division 
gives the middle expression in the box. Now the third expression is equal to the 
second because n(n — 1)... (n — k +1) = n!/(n — k)!; the denominator cancels all 
the factors from n — k on in the numerator. 

Once we have a formula, there are two possible ways to prove assertions of 
identities about binomial coefficients. There is a combinatorial proof, arguing from 
the definition (we will interpret (7) as the number of ways of choosing a team of k 


k 
players from a class of n pupils); and there is an algebraic proof, from the formula. 


We give a few simple ones. 
_f on 
“An ky 


First PROOF. Choosing a team of k from a class of n is equivalent to choosing the 
n — k people to leave out. 


(3.2.2) Fact. 


SECOND ProorF. It’s obvious from the last formula in the box. 


(3.2.3) Fact. 


First Proor. We choose a team of k and designate one team member as captain. 
There are (3) possible teams and, for each team, there are k choices for the captain. 
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Alternatively, we could choose the captain first (in n possible ways), and then the 
remainder of the team (k — 1 from the remaining n — 1 class members). 

Note that this is an application of the ‘double counting’ principle described in 
Section 2.6. 


SECOND Proo¥. Try it yourself! 


You will find that the SECOND Proofs above probably come more naturally to 
you. For this reason, IIl concentrate on the combinatorial style of proof for the next 
couple of results. Remember that the algebraic proof is not always appropriate or 
even possible — sometimes we won't have a formula for the numbers in question, 
or the formula is too complex. (See the discussion of Stirling numbers in Section 5.3 


for examples of this.) 
n+l\ fon n 
CHRODH 


ProoF. We have a class of n +1 pupils, one of whom is somehow ‘distinguished’, and 
wish to pick a team of k. We could either include the distinguished pupil (in which 
case we must choose the other k — 1 team members from the remaining n pupils), 
or leave him out (when we have to choose the whole team from the remaining 7). 


(3.2.4) Fact. 


(3.2.5) Fact. 


T 


=0 


PRooF. This one is easy — there are 2” subsets altogether (of arbitrary size). 


n r) e) 
2 (; nj’ 
PROOF. The right-hand side is the number of ways of picking a team of n from a 


class of 2n. Now suppose that, of the 2n pupils, n are girls and n are boys. In how 
many ways can we pick a team of k girls and n — k boys? Obviously this number is 


(2) (ane which is equal to G, by Fact 3.2.2. The result now follows. 


The definition of the binomial coefficient (o) actually makes sense for any non- 
negative integers n and k: if k > n, then there are no k-subsets of an n-set, and 
Ai = 0. The (first) formula gives the right answer, since if k > n then one of the 
factors in the numerator is zero. (This cannot be assumed, since the argument we 
gave is only valid if k < n.) However, the second formula makes no sense (unless, 
very dubiously, we assume that the factorial of a negative integer is infinite!). 
Facts 3.2.2-4 above remain valid with this more general interpretation. (You 
should check this.) 
Sometimes it is convenient to widen the definition still further. For example, if 
k < 0, we should define (7) = 0, in order that Fact 3.2.2 should hold in general. We'll 


(3.2.6) Fact. 
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see in Chapter 4 that it is possible to relax the requirement that n is a non-negative 
integer even further. The most general definition, using the formula, works for any 
real number n and any integer k: we set 


a(n—1)...(n-—k+1) , 
jee if k > 0; 
0 ifk <0. 


3.3. The Binomial Theorem and Pascal’s Triangle 


Fact 3.2.5 above can be generalised to the celebrated Binomial Theorem.’ A binomial 
is a polynomial with two terms; the Binomial Theorem states that, if a power of a 
binomial is expanded, the coefficients in the resulting polynomial are the binomial 
coefficients (from which, obviously, they get their name), 


(3.3.1) Binomial Theorem 


nm 


(1+t)" = ae 


First Proof. It’s clear that (1 + t)” is a polynomial in ¢ of degree n. To find the 
coefficient of ¢*, consider the product 


(1+2¢)(14+2)...(1 +2) (n factors). 


The expansion is obtained by choosing either 1 or ¢ from each factor in all possible 
ways, multiplying the chosen terms, and adding all the results. A term ¢* is obtained 
when ¢ is chosen from k of the factors, and 1 from the other n — k factors. There 
are G) ways of choosing these k factors; so the coefficient of ¢* is (2), as claimed. 


SECOND PROOF. The theorem can be proved by induction on n. It is trivially true 
for n = 0. Assuming the result for n, we have 


QAH = (142)"- (14) 


EGA 


the coefficient of ¢* on the right is (71) + (3) (the first term coming from t*-! . ¢ 
and the second from ?* - 1); and 


(.2,)+G@)- Ce’) 


? Proved by Sir Isaac Newton in about 1666. 


by Fact 3.2.4. 
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The Binomial Theorem allows the possibility of completely different proofs of 
properties of binomial coefficients, some of which are quite difficult to prove in other 
ways. Here are a couple of examples. First, a proof of Fact 3.2.3. 

Differentiate the Binomial Theorem with respect to t: 


nl +A => e(o) pmi, 
k=1 k 
The coefficients of t*-! on the left and right of this equation are n(it) and k(t) 
respectively. 


(3.3.2) Fact. For n > 0, the numbers of subsets of an n-set of even and of odd 
cardinality are equal (viz. 2"~"). 


PROOF, Put t = —1 in the Binomial Theorem to obtain 


o=(1-1n=¥ (Jn, 


k=0 


n n 
= ()- = (0) 
k odd 


k even 
But the two sides of this equation are just the numbers of subsets of even, resp. odd, 
cardinality. 

If n is odd, then k is even if and only if n — k is odd; so complementation sets up 
a bijection between the subsets of even and odd size, proving the result. However, in 
general, a different argument is required. The map X > X A{n} (that is, ifn € X, 
then remove it; otherwise put n into X) is a bijection on subsets of {1,..., n} which 
changes the cardinality by 1, and hence reverses the parity; so there are equally 
many sets of either parity. 

The argument can be refined to calculate the number of sets whose size lies in 
any particular congruence class. I illustrate by calculating the number of sets of size 
divisible by 4. I assume that n is a multiple of 8. (The answer takes different forms 
depending on the congruence class of n mod 8.) 


(3.3.8) Proposition. If n is a multiple of 8, then the number of sets of size divisible 
by 4 is 2°? + 2(n-2)/2, 

For example, if n = 8, the number of such sets is (8) + () + (5) = 2 493, 
Proor. We let A be the required number, and B the number of sets whose size is 
congruent to 2 (mod 4). By Fact 3.3.2, A + B = 27+. 

Now substitute ¢ = i in the Binomial Theorem. Note that 1 +i = J2ei/4, and 
so (since n is a multiple of 8), (1 +i)" = 2”, Thus 

gn? = > (7) 
k=0 k 
Take the real part of the right-hand side, noting that i* = 1,1,—1, —i according as 
k = 0,1,2 or 3 (mod 4). We obtain A — B = 27. From this and the expression for 
A+ B above, we obtain the value of A (and that of B). 
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REMARK. By taking the imaginary part of the equation, we find the numbers of sets 
with size congruent to 1, or to 3, mod 4. 


The binomial coefficients are often written out in the form of a triangular array, 
known as Pascal’s Triangle:? 


1 5 10 10 5 1 
1 6 15 20 15 6 #1 
1 7 21 35 35 21 7 1 
1 8 28 56 70 56 28 7 #1 


Thus, (2) is the kt? element in the n*" row, where both the rows an the elements 
in them are numbered starting at zero. Fact 3.2.4 shows that each internal element 
of the triangle is the sum of the two elements above it (ie., above and to the left 
and right). Moreover, the borders of the triangle are filled with the number 1 (since 
o = A = 1). With these two rules, it is very easy to continue the triangle as far 
as necessary. This suggests that Pascal’s Triangle is an efficient tool for calculating 
binomial coefficients. (See Exercise 7.) 


3.4. Project: Congruences of binomial coefficients 


A popular school project is to examine the patterns formed by the entries of Pascal’s 
Triangle modulo a prime. For example, the first eight rows mod 2 are as follows: 


T 
If T consists of the first 2” rows, then the first 2"”+! rows look like T 0 T. 


Thus the pattern has a ‘self-similarity’ of the kind more usually associated with 
fractals than with combinatorics! A similar pattern holds for congruence modulo 
other primes, except that the copies of T are multiplied by the entries of the p-rowed 
Pascal triangle. 


2 Not surprisingly, this object was known long before Pascal. I owe to Robin Wilson the information 
that it appears in the works of the Majorcan theologian Ramon Llull (1232-1316). Llull also 
gives tables of combinations and mechanical devices for generating them, complete graphs, trees, 
etc. However, combinatorics for him was only a tool in his logical system, and logic was firmly 
subservient to theology. In his first major work, a commentary on Al-Ghazali, he says, ‘We will speak 
briefly of Logic, since we should speak of God’ 
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The mathematical formulation is remarkably simple. It was discovered by Luca 
in the nineteenth century. 


(3.4.1) Lucas’ Theorem 
Let p be prime, and let m = ao + ap +... + app", n = b + bp + 
.. + bap", where 0 < ai, bi < p fori =0,...,k — 1. Then 


= ll (5) (mod p). 


1=0 


Nors. We assume here the usual conventions for binomial coefficients, in particular, ($) =Oife<é 


Proor. It suffices to show that, if m = cp +a and n = dp + b, where 0 < a,b < p, then 


(7) = (a) (6) moan 


For a = ao, b = bo, and c= a; +.. .+ appt}, d = bi +... + yp*—!; and then induction finishes the 
job. 

This assertion can be proved directly, but there is a short proof using the Binomial Theorem, 
The key is the fact that, if p is prime, then 


(1+ =1+t (mod p). 


This is because each binomial coefficient (£), for 1 < i < p — 1, is a multiple of p, so all intermediate 
terms in the Binomial Theorem vanish mod p. (For (?) = p!/il(p ~ i)!, and p divides the numerator 
but not the denominator.) Thus (congruence mod p): 


(1+ 2)" = (1+4)?(1 i) 
= (1 +e?)*(1 +2)* 
= S fe iy aN 
“Er Ee 


Since 0 < a,b < p, the only way to obtain a term in t” = ¢¢?+° in this expression is to take the 
term ¢ = d in the first sum and the term j = 6 in the second; this gives 


(a) = (9G) man, 


as required. 


3.5. Permutations 


There are two ways of regarding a permutation, which I will call ‘active’ and 
‘passive’. Let X be a finite set. A permutation of X , in the active sense, is a 
one-to-one mapping from X to itself. For the passive sense, we assume that there 
is a natural ordering of the elements of X, say {21,22,... »tn}. (For example, X 
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might be {1,2,...,2}.) Then the passive representation of the permutation 7 is the 
ordered n-tuple (7(21), 7(#2),..-,7(#n))2 

In the preceding paragraph, I wrote m(x) for the result of applying the function 
m to the element x. However, in the algebraic theory of permutations, we often have 
to compose permutations, i.e., apply one and then the other. In order that the result 
of applying first 7, and then 72 can be called 7,72, it is more natural to denote the 
image of x under 7 as xx. Then 


zirina) = (x71), 


which looks like a kind of associative law!4 


As is (I hope) familiar to you, the set of all permutations of {1,...,n}, equipped 
with the operation of composition, is a group. It is known as the symmetric group 
of degree n, denoted by S, (or sometimes Sym(n)). The symmetric groups form one 
of the oldest and best-loved families of groups. 


From now on, we take X = {1,2,...,n}. 
A permutation 7 can be represented in so-called two-line notation as 


(i 2... z) 
lr Qn... nx} 
The top row of this symbol can be in any order, as long as xm is directly under z 


for all x. If the top row is in natural order, then the bottom row is the passive form 
of the permutation. 


(3.5.1) Proposition. The number of permutations of an n-set is n!. 


Proor. Take the top row of the two-rowed symbol to be (1 2 ... n). Then there 
are n choices for the first element in the bottom row; n — 1 choices for the second 
(anything except the first chosen element); and so on. 


Note that this formula is correct when n = 0: the only permutation of the empty 
set is the ‘empty function’. 

There is another, shorter, representation of a permutation, the cycle form. A 
cycle, or cyclic permutation, is a permutation of a set X which maps 


Bro... HE, T), 


where 21;,...,2, are all the elements of X in some order. It is represented as 
{£1 £2 ... Zn) (not to be confused with the passive form of a permutation!) The 
cycle is not unique: we can start at any point, so (aj ... En £1 ... 2-1) represents 
the same cycle. 


3 In the nineteenth century, it was more usual to refer to a passive permutation as a permutation, 
synonymous with ‘rearrangement’. An active permutation was called a substitution. 


4 We say that permutations act on the right if they compose according to this rule. 
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(3.5.2) Proposition. Any permutation can be written as the composition of cycles on 
pairwise disjoint subsets. The representation is unique, apart from the order of the 
factors, and the starting-points of the cycles. 


The proof of this theorem is algorithmic. Let a be a permutation of X. 


(3.5.3) Decomposition into disjoint cycles 
WHILE there is a point of X not yet assigned to a cycle, 
e choose any such point z; 
è let m be the least positive integer such that rn” = z; 
o construct the cycle (x am ... zx™—1), 
RETURN the product of all cycles constructed. 


Proof. In the algorithm, we use the notation 7” for the composition of m copies of 
m. We first have to show that the construction makes sense, that is, (z sa ... r7™~") 
really is a cycle. This could only fail if the sequence of elements contains a repetition. 
But, if ex! = or’, where 0 <i < j < m, then (because 7 is one-to-one) it holds that 
z = ani-'; but this contradicts the choice of m as least integer such that aa” = zv. 

Next, we establish that the cycles use disjoint sets of points. Suppose that z7‘ = 
yri, and suppose that z is chosen before y. If yr™ = y, then rr't™-i = yr™ = y, 
contradicting the fact that y (when chosen) doesn’t already lie in a cycle. 

It is clear that any point of X lies in one of the chosen cycles. Finally, the 
composition of all these cycles is equal to 7. For, given a point z, there is a unique 
y and i such that z = yz’. Then the cycle containing y agrees with 7 in mapping z 
to yzt1, and all the other cycles have no effect on z. 


EXAMPLE. The permutation G 23i s), in cycle notation, is (1 3 4)(2 6)(5). This 


is just one of 36 different expressions: there are 3! = 6 ways to order the three cycles, 
and 3-2-1 = 6 choices of starting points. 


3.6. Estimates for factorials 
Since many kinds of combinatorial objects (for example, binomial coefficients) can 
be expressed in terms of factorials, it is often important to know roughly how large 
n! is. In Exercise 3 of the last chapter, upper and lower bounds were found by ad 
hoc methods. In this section, a more systematic approach will yield better estimates. 
I will prove: 
(3.6.1) Theorem. 
nlogn —n+1 < logn! < nlogn -n + (log(n + 1) + 2 — log 2). 
From this, it follows that 


logn! = nlogn — n + O(log n). 
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This is weaker than an asymptotic estimate for n! itself: the exponentials of the 
upper and lower bounds are e(n/e)” and }(n + l)e*(n/e)", which differ by a factor 
of (n + 1)e/2. A more precise estimate (not proved here) is: 


(3.6.2) Stirling’s Formula 
(2) (40(8) 


PROOF OF THEOREM. The main tool is shown in the pictures of Fig. 3.1. 


n-ln atl 
(b) 


Fig. 3.1. Sums and integrals 


y = logz is an increasing function of x for all positive x (its derivative, 1/zx, is 
positive), the tops of the rectangles in Fig. 3.1(a) all lie above the curve y = log z, 
and those in Fig. 3.1(b) lie below the curve. In other words, 


n n n+l 
J log z de < Dologi < f log z dz. 
2 


i=2 


The term in the middle is log n!. So 


nlogn—n+1 <logn! < (n+ 1) log(n +1) - (n+ 1) — 2log2 +2. 
The lower bound is exactly what is needed. For the upper bound, note that 


n+l 
log(n +1) —logn = f de) 
n zt nr 


so nlog(n + 1) < nlogn + 1. Combining this with the upper bound, we obtain 
logn! < log(n + 1) +nlogn -n + 2log2 — 2. 


If you are interested, you could regard the proof of Stirling’s Formula as a 
project." A lower bound only slightly weaker than Stirling’s is given in Exercise 11. 

Exercise 12 gives an example of the use of Stirling’s Formula to estimate a 
binomial coefficient. A weaker result can be obtained much more easily: 


5 An accessible proof can be found in Alan Slomson, Introduction to Combinatorics (1991). 
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(3.6.3) Proposition. 


2°" /(2n +1) < e) <2”, 


Proor. Immediate from the fact that the 2n + 1 binomial coefficients (7), for 
i=0,...,2n, have sum 2?", and the middle one is the largest. 


3.7. Selections 
In how many ways can one select k objects from a set of size n? 


The answer differs according to the terms of the problem, as we saw in Chapter 2. 
Specifically, is the order in which the objects are chosen significant (a permutation) 
or not (a combination)? and is the same object permitted to feature more than once 
in the selection, or not? (The term ‘permutation’ is used in a more general sense 
than in the last section: this is what might more accurately be called a ‘partial 
permutation’. 


(3.7.1) Theorem. The number of selections of k objects from a set of n objects is 
given by the following table: 


Permutations and combinations 
Order significant Order not significant 


Repetitions nt n+k—-1 
allowed k 


mePeliowed 7" —1-e-(n BT) 


Proor. For the column ‘order significant’, these are straightforward. If repetitions 
are allowed, there are n choices for each of the k objects; if repetitions are not 
allowed, there are n choices for the first, n — 1 for the second, n — k +1 for the k™. 

For ‘order not significant’, if repetitions are not allowed, we are counting the 
k-subsets of an n-set, which we already know how to do. The final entry is a bit 
harder, 


(3.7.2) Lemma. The number of choices of k objects from n with repetitions allowed 
and order not significant is equal to the number of ways of choosing n non-negative 
integers whose sum is k. 


Proor. Given a choice of k objects from the set a1,...,@n, let z; be the number of 
times that the object a; gets chosen. Then z; > 0, Dim zi = k. Conversely, given 
(x1,...,2n), form a selection by choosing object a; just z; times. 
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(3.7.3) Lemma. The number of n-tuples of non-negative integers 21,...,, with 
bit ct By = k is (CE) = (EN, 

Proor. Consider the following correspondence. Put n + k — 1 spaces in a row, 
and fill n — 1 of them with markers. Let xı be the number of spaces before the 
first marker; z; the number of spaces between the (i — 1)* and ih marker, for 
2<i<n-—1; and z, the number of spaces after the n'* marker. Then z; > 0, 
Ya; = (n +k-—1)-— (n — 1) =k. Conversely, given z1,- -., Tn, put markers after zı 
spaces, after x2 more spaces, ..., after z,-1 more spaces (so that x, spaces remain). 


EXAMPLE. Suppose that n = 3, k = 4. The pattern of spaces and markers 
0D0RO8RO 


corresponds to the values x1 = 2, 72 = 1, x3 = 1. Conversely, the values (z1,£2,£3) = 
(0,0,4) correspond to the pattern 


BHROOOC. 


mte) — 


Now the number of ways of choosing the positions of the markers is ( ee = 


(ntk-1\ , as claimed. 


REMARK. Using the extended definition of binomial coefficients, the number of 
selections with repetitions allowed and order not significant can be written 


cor) 


A common puzzle is to find as many words as possible which can be formed 
from the letters of a given word. Of course, the crucial feature of this problem 
is that the words formed should belong to some given human language (i.c., they 
should be found in a standard dictionary). There are two possible strategies for this 
problem. We could either form all potential words (all permutations of whatever 
length), and look each one up in the dictionary; or go through the entire dictionary, 
and check whether each word uses a subset of the given letters. In order to decide 
which strategy is more efficient, we need to answer a theoretical question (how many 
permutations are there?) and some practical ones (how many words are there in the 
dictionary, and how fast can we look them up?) 

We will solve a special case of the theoretical question. Assume that the n given 
letters are all distinct. We will call any ordered selection without repetition from 
these letters a word (without judging its legality — note in particular that we include 
the ‘empty word’ with no letters, which doesn’t appear in any dictionary‘). 


(3.7.4) Proposition. The number of ordered selections without repetition from a set 
of n objects is |e- n!|, where e is the base of natural logarithms. 


8 If it did, how would you look it up? 
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Proor. The number f(n) in question is just 


1 1 
Sal @thind2) 
poe re oe 
n+1 0 (n+1}? 


=+<1, 


e-n!— f(n) 


+... 


so f(n) = |e- nl. 

If the allowed letters contain repetitions, the problem is harder, It is possible to 
derive a general formula; but it is probably easier to argue ad hoc in a particular 
case, as the next example shows. 


EXAMPLE. How many words can be made from the letters of the word FLEECE? 


We count words according to the number of occurrences of the letter E. If there 
is at most one E, we can invoke the previous result: there are 24+ 24+12+4+1 = 65 
such words (including the empty word). If there are two Es, let us imagine first that 
they are distinguishable; then there are 2+3-6+43-24+4 120 = 212 possibilities. (For 
example, with four letters altogether, we choose two of the remaining three letters 
in (5) = 3 ways, and arrange the resulting four in 4! = 24 ways.) Since the two 
Es are in fact indistinguishable, we have to halve this number, giving 106 words. 
Finally, with three distinguishable Es, there would be 6 + 3: 24 +3- 120 + 720 = 1158 
possibilities, and so there are 1158/6 = 193 words of this form. So the total is 
65 + 106 + 193 = 364 words. 


3.8. Equivalence and order 


A relation on a set X is normally regarded as a property which may or may not 
hold between any two given elements of X. Typical examples are ‘equal’, ‘less than’, 
‘divides’, etc. The definition comes as a surprise at first: a relation on X is a subset 
of X? (the set of ordered pairs of elements of X). What is the connection? Of 
course, a relation in the familiar sense is completely determined by the set of pairs 
which satisfy it; and conversely, given any set of pairs, we could imagine a property 
which was true for those pairs and false for all others. 

This dual interpretation causes a small problem of notation. In general, if 
RC X? is a relation, we could write z R y to have the same meaning as (x,y) € R 
This is consistent with the usual notations t = y, + < y, z|y, ete. But we don’t 
reverse the procedure and write (x,y) € =, (x,y) E <, ete! 
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Here are some important properties which a relation R may or may not have: 
o Ris reflexive if, for all x € X, we have (z,z) € R 
e Ris irreflexive if, for all z € X, we have (z,z) ¢ R. (This is not the same as 
saying ‘R is not reflexive’) 
e Ris symmetric if, for all z,y € X, (z,y) € R implies (y, 2) E€ R. 
o Ris antisymmetric if, whenever (x,y) € R and (y,z) E R both hold, then z = y. 
o R is transitive if, for all z,y,z € X, (z,y) E R and (y,z) € R together imply 

(w,z) E R. 

For example, the relation of equality is reflexive, symmetric and transitive; the 
relation ‘less than or equal’ is reflexive, antisymmetric and transitive; the relation 
‘Jess than’ is irreflexive, antisymmetric and transitive; and the relation of adjacency 
in a graph (as described in Section 2.5) is irreflexive and symmetric. 

Note that there are two ways of modelling an order relation: as ‘less than’ 
(irreflexive) or as ‘less than or equal’ (reflexive). 


We proceed to define some important classes of relations in terms of these 
properties. 

An equivalence relation is a reflexive, symmetric and transitive relation. It 
turns out that equivalence relations describe partitions of a set. Let R be an 
equivalence relation on X. For x € X, the equivalence class containing z is the set 
Riz) = {y € X : (2,y) E€ R}. A partition of X is a family of pairwise disjoint, 
non-empty subsets whose union is X — thus, every point of X lies in exactly one 
of the sets. 


(3.8.1) Theorem. Let R be an equivalence relation on X. Then the equivalence 
classes of R form a partition of X. Conversely, given any partition of X, there is 
a unique equivalence relation on X whose equivalence classes are the parts of the 
Partition. 


Proof. Let R be an equivalence relation on X. 

e Each equivalence class is non-empty, and their union is X; for, by reflexivity, 
each point z € X lies in the class R(x), and conversely, R(x) contains z. 

o The equivalence classes are pairwise disjoint. For suppose that two classes R(x), 
R(y) have a common point z. We will show that R(z) = R(y). By definition, 
(z,z),(y,z) € R. By symmetry, (z,y) € R; then, by transitivity, (z, y) € R. 
Now, to prove two sets equal, we have to show that each set contains the other. 

So suppose that w € R(y). Then (y, w) € R. Since (z,y) € R, transitivity implies 
that (z,w) € R, or w € R(x). So R(y) C R(x). The reverse implication is similar. 


For the converse, suppose that the sets Y3, Yz2,... form a partition of X. Define 
a relation R by the rule that (z, y) € R if there is an index 7 such that z and y both 
lie in R;. It is not dificult to prove that R is an equivalence relation. For example, to 
show reflexivity, take x € X; by assumption there is a (unique) ż such that z € Yj; 
so (x,z) € R. The other two properties are an exercise. 


Thus, for example, the number of partitions of a set is equal to the number of 
equivalence relations on that set. We will study these numbers (the Bell numbers) in 
Section 3.11. 
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We turn now to order relations. As mentioned above, there are two ways to 
model an order relation: we use the reflexive one (taking ‘less than or equal’) as the 
prototype. 

A relation R on X is a partial order if it is reflexive, antisymmetric, and transitive. 

Note that there may be some pairs of elements which are not comparable at all 
(i.e. neither (x,y) € R nor (y,c) E€ R hold). A relation R is said to satisfy trichotomy 
if, for any z,y € R, one of the cases (z,y) E€ R, z = y, or (y,z) € R holds. Then 
a relation R is a total order if it is a partial order which satisfies trichotomy. We 
commonly omit the word ‘total’ here; an order is a total order. 


(3.8.2) Proposition. The number of orders of an n-set is equal to n!. 


Remark. In fact we show that, given any order on an n-set, its elements can be 
numbered 21,...,2,, 80 that (x;,2;) € R if and only if i < j; and there is a unique 
way of doing this. In other words, the axiomatic definition of order agrees with our 
expectations! 


ProoF. We show first that there is a ‘last’ element of X, an element z such that, 
if (x,y) € R, then y = z. Suppose that no such v exists, Then, for any z, there 
exists y # z such that (z,y) € R. Start with z = 2, and choose x2, 23,.., according 
to this principle (so that (z;,2;4;) E€ R for all i ). By transitivity, (2;,2;) € R for 
all i < j, and a; # 2:41 for all i Now X is finite, so the sequence eventually 
bites its tail; that is, there exists ¿ < j so that z; = z;. Then (z;-1,2;) E€ R, and 
(£j, 2%;-1) = (2;,4;-1) € R since i < 7 — 1. By antisymmetry, z; = 2;-1, contrary to 
the construction. 

Now there cannot be more than one ‘last’ element, since, for any z and y, either 
(z,y) € Ror (y,z) € R by trichotomy. 

Call the last element z,,; then, by trichotomy, (z, £n) € R for all z € X. 

Arguing by induction, there is a unique way to label the remaining elements as 
Z1)+-+,2n—1, in accordance with the assertion. The proposition is proved. 

We see that orders on X are equinumerous with permutations of X; indeed, 
our representation of an order looks like the ‘passive’ form of a permutation. But 
there is no ‘canonical’ bijection between orders and permutations; we need one 
distinguished order to set up this correspondence. (Then any order R corresponds 
to the permutation which takes the distinguished order into R.) 


In the next section, we will consider a generalisation of (partial) orders. A 
relation R is a partial preorder (or pre-parital order) if it is reflexive and transitive — 
we relax the condition of antisymmetry. Exercise 18 outlines a proof that, if R is a 
partial preorder on X, then there is a natural way to define an equivalence relation 
on X so that the set of equivalence classes is partially ordered. (We set z = y if 
both z R y and y R z hold: think of such z and y as being indistinguishable. Now 
the truth of the relation z Ry is unaffected if either z or y is replaced by a point 
which is indistinguishable from it; so R induces a relation on the equivalence classes 
which is still reflexive and transitive, and is also antisymmetric.) 

A partial preorder satisfying trichotomy is called a preorder. 
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3.9. Project: Finite topologies 


Topology is the study of continuity. The term suggests doughnuts, Mobius bands, 
and such like. There is, however, an abstract definition of a topology, and it applies 
to finite as well as infinite spaces. We ate going to translate the meaning of ‘finite 
topology’ into something simpler and more combinatorial. 


A topology consists of a set X, and a set 7 of subsets of X, satisfying the following axioms: 
eT and XET; 
è the union of any collection of sets in T is in T; 
è the intersection of any two sets in 7 is in 7. 
Sets in J are said to be open. The idea is that, if x is a point and U an open set containing T, the 
points of U are in some sense ‘close’ to z. (Indeed, U is often called a neighbourhood of z.) 


REMARK. It follows by induction from the third axiom that the intersection of any finite number of 
members of T is a member of T. If X is finite, the second axiom need only deal with finite unions, 
and so it too can be simplified to the statement that the union of any two sets in T is in T; then the 
axioms are ‘self-dyal’, This is not the case in general! `% 


(3.9.1) Theorem. Let X be finite. Then there is a one-to-one correspondence between the topologies 
on X, and the partial preorders (i.c., reflexive and transitive relations) on X. 


Thus, describing finite topologies (sets of sets) reduces to the simpler task of describing partial 
preorders (sets of pairs), No such correspondence holds for infinite sets! 
Proor. The correspondence is simple to describe; the verification less 30. 
Construction 1. Let T be a topology on X. Define a relation R by the rule that (z, y) € R if every 
open set containing z also contains y. It is trivial that R is reflexive and transitive; that is, R is a 
partial preorder. 
CONSTRUCTION 2. Let R be a partial preorder on X. Call a subset U of X open if, whenever z € U, 
we have R(x) C U, where 

R(x) = {y : (£, y) € R}. 

Let T7 be the set of all open sets, We have to verify that 7 is a topology. The first axiom requires 
no comment. For the second axiom, let U1, U2, ... be open, and z € |); U;; then z € U; for some j, 
whence 


R(=) GU; CU. 


For the third axiom, let U and V be open and z € UNV. Then R(z) G U and R(x) S V, and so 
R(x) C (UNV); thus V NV is open. 

All this argument is perfectly general. It is the fact that we have a bijection which depends on 
the finiteness of X. We have to show that applying the two constructions in turn brings us back to 
our starting point. 

Suppose first that R is a partial preorder, and 7 the topology derived from it by Construction 2. 
Suppose that (z, y) € R. Then y € R(z), so every open set containing £ also contains y. Conversely, 
suppose that every open set containing x also contains y. The set R(x) is itself open (this uses the 
transitivity of R: if z € R(x), then R(z) C R(z)), and so y € R(x); thus (z, y) € R. Hence the partial 
preorder derived from 7 by Construction 1 coincides with R. (We still haven't used finiteness!) 

Conversely, let T be a topology, and R the partial preorder obtained by Construction 1. If 
U €T and z € U, then R(z) G U; so U is open in the sense of Construction 2, Conversely, suppose 
that U is open, that is, z € U implies R(z) C U. Now each set R(z) is the intersection of all members 
of T containing z. (This follows from the definition of R in Construction 1.) But there are only 
finitely many such open sets (here, at last, we use the fact that X is finite!); and the intersection of 
finitely many open sets is open, as we remarked earlier; so R(x) is open. But, by hypothesis, U is 
the union of the sets R(z) for all points z € U; and a union of open sets is open, so U is open, as 
required. 
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In the axiomatic development of topology, the next thing one meets after the definition is usually 
the so-called ‘separation axioms’. A topology is said to satisfy the axiom To if, given any two distinct 
points z and y, there is an open set containing one but not the other; it satisfies axiom T; if, given 
distinct z and y, there is an open set containing z but not y (and vice versa). 

These two axioms for finite topologies have a natural interpretation in terms of the partial 
preorder R. Axiom T, asserts that R never holds between distinct points x and y; that is, R is 
the trivial relation of equality. Construction 2 in the proof of the theorem then shows that every 
subset is open. (This is called the discrete topology.) It follows that any stronger separation axiom 
(in particular, the so-called ‘Hausdorff axiom’ T2) also forces the topology to be discrete. 

Axiom To translates into the condition that the relation R is antisymmetric; thus, it is a partial 
order, So there is a one-to-one correspondence between To topologies on the finite set X and partial 


orders on X. 


3.10. Project: Cayley’s Theorem on trees 


As we saw at the end of Section 3.8, the number of orderings of an n-set is equal 
to the number of permutations of the same set, namely n!. This seems too trivial to 
be of any use at all, but in fact it forms the basis of a conceptual proof of a very 


famous theorem of Cayley:’ 


(3.10.1) Cayley’s Theorem on trees 
The number of labelled trees on n vertices is n™—?. 


The definitions will be given somewhat briefiy; graphs (and trees in particular) are discussed in 
more detail in Chapter 11. A graph consists of a set of vertices and a set of edges, each edge consisting 
of a pair of vertices. The edge is regarded as joining the two yertices. Graphs were mentioned in 
Chapter 2, where we also introduced the distinction between labelled and unlabelled graphs. Here, 
we will be counting labelled graphs; that is, the vertex set is always {1,2,..., n}, and two graphs are 
the same precisely when they have the same set of edges. 

A path in a graph is a sequence of vertices, all distinct except perhaps the first and the last, 
with the property that consecutive vertices in the sequence are adjacent (joined by an edge). A graph 
is connected if any two vertices are the ends of a path. A circuit is a path (having more than two 
vertices) auch that the first and last vertices are equal. A tree is a connected graph containing no 
circuit. Cayley’s Theorem asserts that there are n*~? trees on n vertices. 

We prove this theorem by counting slightly different structures called vertebrates. A vertebrate 
is a tree with two distinguished vertices called the head and the tail, which may or may not be equal. 
There is a path from the head to the tail, and it is unique (or else there would be a circuit); this path 
is called the backbone. If T(n) is the number of trees on n vertices, then the number of vertebrates is 
n?T(n) (each of the head and tail is chosen from a set of size n). So it is enough to prove that there 


ate exactly n” vertebrates on the set N = {1,...,n}- 
An endofunction on N is simply a function from N to itself. In fact, what we show is: 


(3.10.2) Proposition. The numbers of vertebrates and endofunctions on N are equal. 


Obviously there are n” endofunctions; so this will prove Cayley’s Theorem. It would suffice to 
find a bijection between vertebrates and endofunctions. But there is no ‘natural’ bijection, so we have 
to do something a bit more complicated. First, one more small piece of notation. A rooted tree is a 
tree with a single distinguished vertex (called, naturally, the root.) 


ee a aam 


7 The proof outlined here is adapted from an argument by André Joyal (1981). 
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(3.11.1) Recurrence for Bell numbers 
Forn > 1, 


Proor. Take X = {1,...,n}, and consider a partition of X. It has a unique part 
containing n, say {n} UY, where Y is a subset of the (n — 1)-set {1,..., — 1}. 
The remaining parts form a partition of the set {1,...,n — 1} \ Y. These data (the 
subset Y, and the partition) determine the original partition uniquely. If |Y| = k - 1, 
then there are ("~!) choices of Y, and B,_, choices of a partition of the remaining 


k— « . 
points. re are (rca) and summing over all possible values of k (from 1 to n), gives 


the result. 
3 3 3 3 
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For example, 
=54+(3-2)+(3-1)41 
= 15, 


3.12. Generating combinatorial objects 


Combinatorial problems have a tendency to grow in size explosively as the size of 
the set increases. It often happens that a few small values can be done by hand, and 
then we have to resort to the computer to settle a few more cases.? If the problem 
involves checking all objects of some kind (subsets, permutations, etc.), then we need 
an algorithm to generate all of these. 

Usually the simplest algorithm (conceptually) involves recursion, based on the 
way in which the objects are built up from smaller ones. For example, here is 
a recursive algorithm for generating the power set of {1,... n}. Note how the 
algorithm resembles the proof of the recurrence relation F(n+1) = 2F(n), F(0) =1 
for the counting function. 


(3.12.1) Recursive algorithm: Power set of {1,...,n} 
If n = 0, return {0}. 
Otherwise, 


e generate the power set of {1,...,n—1}; 
e make a new copy of each subset and adjoin the element n to it; 


e return the set of all sets created. 


8 After this, brainwork is the only way. 


3.12. Generating combinatorial objects 
In symbols: P(0) = {0}; 
P({l,... nm) = {Y,Y U {n} : Y € P({l,...,2—1})} 
for n > 0. 
In a similar way, the recurrence relations 
e n\  fna-l + n—1 
k} \k-1 k 
enl=n(n-1)! 
e the recurrence relation for Bell numbers 
suggest recursive algorithms for k-subsets, permutations, and partitions. 


However, there are disadvantages to this simple approach. The main one is that, 
even for moderate values of n, the set of all subsets (or all permutations) is so large 
that the computer’s memory will not hold it. What we have to do is, rather than 
creating all the objects in one step, generate them one at a time, process each one, 
and then throw it away when the next one is generated.” The algorithm will have 
the following general form. There are two parts. The first step generates the ‘first’ 
object, The second step takes any object and tries to calculate the ‘next’ one; if there 
is no ‘next’ one (so that the current object is the last), it should report this fact. 
Then the structure of a program will be like this: 


Generate first object. 
REPEAT 

o process current object; 

e generate next object 
UNTIL there’s no next object. 


One very important observation is that this set-up presupposes that the objects 
come in some order. But the order is not specified, except in the progression from 
each object to the next.” So these algorithms implicitly define an ordering of the 
relevant objects. 

For subsets of a set, we use the Odometer Principle from Chapter 2. Re-writing 
the algorithm given there, we get: 


(3.12.2) Algorithm: Subsets of {1,...,7} 
FIRST SUBSET is @. 
NEXT SUBSET after Y: 
o Find the last element i not in Y (working back from the end). 
e If there’s no such element, then Y was the last subset. 
o Remove from Y all elements after i, and addi to Y. Return 
this set. 


° If you are writing programs implementing the following algorithms, a good ‘minimal processing’ is 
to count the objects; this provides an additional check on the correctness of the program. 
10 Compare the remarks about the order of the natural numbers in Chapter 2. 
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This displays the principle correctly. In practice, it would be more efficient to 
combine the steps. Thus, we take a pointer i, initialised to n., While i € Y, we 
remove ¿ from Y and decrease i by 1. If we fall off the bottom (i.e., reach i = 0), 
then Y was the last set. Otherwise, add the final value of i to the set Y and return 
the result. 

Note that, if we represent a subset Y of X by its ‘characteristic function’, the 
sequence (a1,...,dn) with 

a= { 1 ifiey, 
0 ifigY, 
and interpret this as the base 2 representation of an integer N = @,2"-'4+...+4,, 
then the algorithm proceeds through the integers from 0 to 2” — 1 in order. This 
ordering of the subsets of a set was discovered by Shao Yung (1160), who proposed 
it as an alternative to the traditional order of the sixty-four I Ching hexagrams 
attributed to King Wên (ca. 1150 BC); and independently and much later by 
Leibniz (1703). So we could simplify the algorithm, using the computer’s inbuilt 
arithmetic. We define the set corresponding to a non-negative integer N by writing 
N to the base 2 and interpreting the result as a characteristic function. If we denote 
the set corresponding to N by Y(N), then the FiRST SUBSET is Y(0); and the NEXT 
SUBSET after Y(N) is Y(N + 1). (The ‘next subset’ procedure fails if N = 2” — 1.) 

This procedure has an additional advantage, in that it gives us ‘random access’ 
to the subsets of a set: we can easily produce the N'" set Y(N) for any N with 
0< N < 2" —1. However, for other cases considered below, it is harder to do this. 


Consider the problem of generating all the k-subsets of a set. Here, there are 
two essentially different ‘natural’ orders in which the subsets could be generated, 
exemplified by the case n = 5, k = 3:1? 


123, 124, 125, 134, 135, 145, 234, 235, 245, 345; 
123, 124,134, 234,125, 135, 235, 145, 245, 345. 


The first ordering is generated by a fragment of program which (in BASIC) would 
look like this (for k = 3, n arbitrary): 


FOR? =1ToOn—-2 
FOR j =i+1 Ton—1 
FOR k =j+17T0n 
process {i,j,k} 
NEXT k 
NEXT f 
NEXT 2 


N It is said that, after Leibniz’ discovery, he was informed of the Chinese precedence by a Jesuit 
missionary, Fr. Joachim Bouvet. But Leibniz went further, using the binary representation for 
arithmetic where Shao Yung was concerned only with the progression. For further discussion, see S. 
N. Afriat, The Ring of Linked Rings (1982). 

12 Tn fact, reversing the order of the numbers {1,..., n} and the order of the subsets takes the first 
ordering to the second. But, to a computer, this would look like time reversal: not an easy trick! 
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(Hopefully this is clear even to non-programmers.) This seems a natural way to do 
it. But it only works in this form if k is small and fixed, Also, the other order 
has a subtle advantage. Observe that the 3-subsets of {1,...,4} occur first, in 
their ‘natural’ order, followed by the subsets containing 5 (which are obtained by 
adjoining 5 to the 2-subsets of {1,...,4} in their natural order). This is in accord 
with the recursive version discussed earlier. Anyway, the following algorithm does 
the job (producing the second order above): 


(3.12.3) Algorithm: k-subsets of {1,...,n} 
FIRST SUBSET is {1,..., k}. 
NEXT SUBSET after Y = {y,,...,ye}, where yı <... < ye: 
o Find the first i such that y,+1¢Y; 
è increase y; by 1, set yj = j for j < i, and return the new set Y; 
© this fails if i = k, y, = n, in which case Y = {n-—k+1,...,n} 
is the last set. 


The two ‘natural’ orders of k-sets can be characterised as follows. The first is 
the so-called leaicographic order. This means that, if we regard the symbols 1,...,n 
as letters of an alphabet, and regard each k-set as a word by writing its elements in 
alphabetical order, then the words occur in lexicographic order (the order in which 
they would be found in a dictionary). The second order is reverse lexicographic: we 
turn k-sets into words as above, but then reverse each word before putting them in 
dictionary order. 

Lexicographic order or something similar is usually the most natural for prob- 
lems of this kind. The next algorithm, for permutations, uses lexicographic order, 
where a permutation is taken in passive form. (That is, we regard a permutation 
as an n-tuple (z1,..., n), where z;,...,z, are 1,...,n in some order.) Here is the 


algorithm. 


(3.12.4) Algorithm: Permutations of {1,... n} 
FIRST PERMUTATION is given by z; = i fori = 1,...,n. 
NEXT PERMUTATION after (21,...,2,): 
© Find the largest j for which z; < 2341 (working back from the 
end). 
o If no such j exists, then the current permutation is the last. 
o Interchange the value of x; with the least x, greater than z; 
with k > j; then reverse the sequence of values of £41,-..,2nj 
return this permutation. 


Here is an example. Suppose the current permutation is (436521). The algorithm 
first locates j = 2, x; = 3. We assert that the current permutation is the last (in 
lexicographic order) of the form (43...), and should be followed by the first of the 
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form (45...), namely (451236). To obtain this, we find k = 4, z = 5. (Since the 
values after z; are decreasing, this can be located by working back from the end 
until we first find a value greater than z;.) Then we interchange the entries in the 
second and fourth positions, giving (456321); and reverse the entries in positions 3 
to 6, giving (451236), as required. 


It is much harder to give an algorithm of this kind for partitions of a set. This 
is related to the non-existence of a simple formula for the Bell number. 


3.13. Exercises 


1. A restaurant near Vancouver offered Dutch pancakes with ‘a thousand and one 
combinations’ of toppings. What do you conclude? 


2. Using the numbering of subsets of {0,1,...,7 — 1} defined in Section 3.1, prove 
that, if X, G Xa, then k < 1 (but not conversely). 


3. Prove the following identities: 
GOGE 
orh) CE) 


(recall the convention that (7) = 0ifk <0 ork >n). 


n+: _ {nt+kt+l 
a ~ k ` 


OS 
(a) È k (?) = nor}, 
ocr) ={Capey inte 


4. Following the method in the text, calculate the number of subsets of an n-set of 
size congruent to m (mod 3) (m = 0, 1,2) for each value ofn (mod 6). 


5. Let k be a given positive integer. Show that any non-negative integer N can be 
written uniquely in the form 


Thk Tk-1 

v= (p) (E 
where 0 < z1 <... < Zh-1 < ay. [Hint: Let z be such that G <N< (n). 
Then any possible representation has z4} = z. Now use induction and the fact 
that N — z) < ( j (Fact 3.2.5) to show the existence and uniqueness of the 

representation.) 

Show that the order of k-subsets corresponding in this way to the usual order 
of the natural numbers is the same as the reverse lexicographic order generated by 


the algorithm in Section 3.11. (Hint: J“ (72) = (+) 


T 
k-1 


j=0 (i=j i 
6. Use the fact that (1 +t) = 1+ 4 (mod p) to prove by induction that n? = n 
(mod p) for all positive integers n. i 
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7. A computer is to be used to calculate values of binomial coefficients. The largest 
integer which can be handled by the computer is 32767. Four possible methods are 
proposed: 


(1) (7) = nl /k!(n — k); 
=n(n—1)...(n—k+1)/k; 


=i, ()-(." ) PoE. for t> 0 


n 
k 
k -1 k 


( 
(4) (o) = () =1, (G) = (; 7 i) + C ') for 0 < k < n (ie. Pascal’s Tri- 


angle). 
For which values of n and k can (3) be calculated by each method? What can you 
say about the relative speed of the different methods? ` 


8. Show that there are (n — 1}! cyclic permutations of a set of n points. 


9. The order of a permutation 7 is the least positive integer m such that 7” is the 
identity permutation. Prove that the order of a cycle on n points is n. Prove that 
the order of an arbitrary permutation is the least common multiple of the lengths 
of the cycles in its cycle decomposition. 


10. How many words can be made from the letters of the word ESTATE? 


11, Given n letters, of which m are identical and the rest are all distinct, find a 
formula for the number of words which can be made. 


12. Show that, for n = 2,3,4,5,6, the number of unlabelled trees on n vertices is 1, 
1, 2, 3, 6 respectively. 

13. The line segments from (i, log ¢) to (i +1, log(?+1)) lie below the curve y = log x. 
(This is because the curve is convex, i.e., its second derivative —1/z? is negative.) 
The area under these line segments from ¢ = 1 to i = n is logn! + 4 log(n + 1), since 
it consists of the rectangles of Fig. 3.1(b) together with triangles with width 1 and 
heights summing to log(n + 1). Deduce that 


ni<evn4+1 Cy. 


[REMARK. According to Stirling’s Formula, the limiting ratio of this upper bound to 
n! is e/v2r = 1.0844... ] 


14, Use Stirling’s Formula to prove that 


(*") 2) Jan 


n 


15. (a) Let n = 2k be even, and X a set of n elements. Define a factor to be 
a partition of X into k sets of size 2. Show that the number of factors is equal 
to 1-3-+5---(2k — 1). This number is sometimes called a double factorial, written 


46 3. Subsets, partitions, permutations 


(2k —1)!! (with !! regarded as a single symbol, the two exclamation marks suggesting 
the gap of two, not the factorial function iterated!) 

(b) Show that a permutation of X interchanges some k-subset with its comple- 
ment if and only if all its cycles have even length. Prove that the number of such 
permutations is ((2k — 1)!!)?. [HINT: any pair of factors defines a partition of X into 
a disjoint union of cycles, and conversely. The correspondence is not one-to-one, 
but the non-bijectiveness exactly balances] 

(c) Deduce that the probability that a random element of S, interchanges some 
1n-set with its complement is O(1/,/n). [HINT: You will probably need two analytic 
facts: 1 — x < e~* for positive z; and 572.,(1/2) = logn + O(1).] 


16. How many relations on an n-set are there? How many are (a) reflexive, (b) 
symmetric, (c) reflexive and symmetric, (d) reflexive and antisymmetric? 


17. Given a relation R on X, define 
Rt = {(z,y) : (¢,y) € Ror £ = y}. 


Prove that the map R — Rt is a bijection between the irreflexive, antisymmetric and 
transitive relations on X, and the reflexive, antisymmetric and transitive relations 
on X. Show further that this bijection preserves the property of trichotomy. 


REMARK. This exercise shows that it doesn’t matter whether we use the ‘less than’ or 
the ‘less than or equal’ model for order relations. 


18. Recall that a partial preorder is a relation R on X which is reflexive and transitive. 
Let R be a partial preorder. Define a relation S by the rule that (z,y) € S if and 
only if both (z,y) and (y,z) belong to R. Prove that S is an equivalence relation. 
Show further that R ‘induces’ a partial order R on the set of equivalence classes 
of S in a natural way: if (z,y) € R, then (7,7) € R, where Z is the S-equivalence 
class containing z, etc. (You should verify that this definition is independent of the 
choice of representatives x and y.) 

Conversely, let X be a set carrying a partition, and R’ a partial order on the 
parts of the partition. Prove that there is a unique partial preorder on X giving rise 
to this partition and partial order as in the first part of the question. 

Show further that the results of this question remain valid if we replace partial 
preorder and partial order by preorder and order respectively, where a preorder is a 
partial preorder satisfying trichotomy. 


19, List the (a) partial preorders, (b) preorders, (c) partial orders, (d) orders on the 
set {1,2,3}. 


20. Prove that B, < n! for all n > 2. [HINT: associate a partition with each 
permutation.] 


21. Verify, theoretically or practically, the following algorithm for generating all 
partial permutations of {1,...,n}: 


3.13. Exercises 


(3.13.1) Algorithm: Partial permutations of {l,...,n} 
FIRST PARTIAL PERMUTATION is the empty sequence. 
NEXT OBJECT after (1,...,2m)! 
è If the length m of the current sequence is less than n, extend it 
by adjoining the least element it doesn’t contain. 
e Otherwise, proceed as in the algorithm for permutations, up to 
the point where z; and x; are interchanged; then, instead of 
reversing the terms after zj, remove them from the sequence, 


22. Verify the following recursive procedure for generating the set of partitions of a 
set X. x 


(3.13.2) Recursive algorithm: Partitions of X 
H X = 9, then Î is the only partition. 
if X £ ô, then 
o select an element z € X; 
o generate all subsets of X \ {x}; 
o for each subset Y, generate all partitions of X \ ({z} U Y), and 
adjoin to each the additional part {z} UY. 


23. Let A = (a;;) and B = (bij) be (n +1) x(n +1) matrices (with rows and columns 
indexed from 0 to n) defined by a;; = (), bi; = (-1)? G) (where À =0ifi< j). 
Prove that B = A~'. [Hint: let V be the vector space of polynomials of degree at 
most n, with basis 1,¢,¢7,...,¢". Show that A represents the linear transformation 
F(t) > f(t +1). What transformation is represented by B?) 


24. PROJECT. A couple of harder binomial identities. Prove: 
a) F 2nt+1\fm+k\ [2m 

ko 2k +1 In J nj 

n 3 

b -7*{") = {CU (3m)!/ (mnt? if n = 2m; 
(b) 2 ) (i) 0 if n is odd. 
25. PROJECT. There are many different proofs of Cayley’s Theorem. Look one up in 
a graph theory textbook, and present it in your own words. 


26. Proyect. A forest is a graph without cycles. Prove that the number F(n) of 
forests on the set {1,...,n} satisfies the recurrence relation 


F(n) = È G- i} kt- F(n — k). 


Calculate the ratio of F(n) to the number n”~? of trees for small n, What can you 
say about this ratio in the limit? 


4. Recurrence relations and 
generating functions 


The way begets one; one begets two; two begets three; three begets the 
myriad creatures. 


Lao Tse, Tao Te Ching (ca. 500 BC) 


Topics: Fibonacci, Catalan and Bell numbers, derangements, [finite 
fields, sorting, binary trees, ‘Twenty Questions} 


TECHNIQUES: Recurrence relations, solution of linear recurrence 
relations with constant coefficients, generating functions and their 
manipulation, [the ring of formal power series] 


ALGORITHMS: Computation of Fibonacci numbers, [QUICKSORT] 


CROSS-REFERENCES: Derangements (Chapter 5), set partitions 
(Chapter 3) 


A recurrence relation expresses the value of a function f at the natural number n 
in terms of its values at smaller natural numbers. We saw a simple example of this 
already: the number F(n) of subsets of an n-set satisfies F(n + 1) = 2F(n). This 
relation, together with the initial value F(0) = 1, determines the value of F for every 
natural number. In this chapter, we examine recurrence relations in more detail. 
An important technique, often associated with recurrence relations but useful in 
its own right, is that of generating functions. These are power series whose coefficients 
form the number sequence in question. We show how generating functions can be 
used either to solve recurrence relations explicitly, or to derive some information 
about the (unknown) solution. The techniques look suspiciously like analysis! 


To begin, here is an introductory example of a proof by generating function. 
Let F(n) be the number of subsets of an n-set, We saw several times already that 
F(n) = 2"; now we will evaluate F(n) by yet another method, seemingly more 
complicated but in fact of very general applicability. Set 


(t) = > F(n)t". 


n=0 


1 To Newton, ‘analysis’ meant manipulation of power series. See V. I. Arnol’d, Huygens & Barrow, 
Newton & Hooke (1990). 
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(Don’t worry for the moment about whether this power series converges.) Now 


2td(t) = $ 2F (net? 
n=0 
= >> F(n+1)t* 
n=0 
= (t) -1, 
the last equality holding because the sum is identical with the definition of ¢(t) 
(with n + 1 replacing n) except that the first term F(0)é° = 1 is missing. Thus 


1 
i) = ay 


The right-hand side is the sum of a geometric progression: 


Co 
(t) = 2 (24)". 
Comparing this with the original series, we conclude that F(n) = 2”. (If two power 
series are equal, then all their coefficients coincide.) 
Incidentally, we now see that the power series converges for all t with |t| < $; 
so our manipulations are justified by analysis. We will return to this question of 
justification later. First, however, we do a less trivial example. 


4.1. Fibonacci numbers 


PROBLEM. In how many ways can the non-negative integer n be written as a sum of 
ones and twos {in order)? 


Let F, be this number. Then, for example, F4 = 5, since 
4=1414141=24141=14241=1414+2=242. 


Similarly, we find that Fy = 1, F; = 2, F} = 3. By convention, we take Fy = 1: the 
only solution for n = 0 is the empty sequence. 

Suppose that n > 2, Any expression for n as a sum of ones and twos must end 
with either a 1 or a 2. If it ends with 1, then the preceding terms sum to n — 1; if it 
ends with a 2, they sum to n — 2. So we have 


Fa = Fh- + Fh-2. 


The numbers Fo, fF, F2,... are called the Fibonacci numbers. 

This is an example of a recurrence relation, more specifically, a three-term linear 
recurrence relation with constant coefficients. The meaning of these terms is, I hope, 
obvious. But, in general, a (k + 1)-term recurrence relation expresses any value F(n) 
of a function in terms of the k preceding values F(n — 1), F(n — 2),...,F(n—k); 
it is near if it has the form 


F(n) = a;(n)F(n — 1) + ao(n)F(n — 2) +... + a; (n)F(n — k), 
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where a,,...,0% are functions of n; and it is linear with constant coefficients if 
Q),...,4% are constants, We will see examples later of recurrence relations in which 
the value of F(n) depends on all the preceding values, in a highly non-linear way; 
so this one is very special. 


Fact: A function satisfying a (k + 1)-term recurrence relation is 
uniquely determined by its values on the first k natural numbers. 


(The first k natural numbers could be 0,...,&—lorl,..., k, depending on context.) 

For, if we know F(1),...,/(k) (say), then these values determine F'(k +1), and 
then the values F(2),...,/(k +1) determine F(k + 2), and so on. The words and 
so on are a signal that we are using induction. Formally, if two functions F and G 
satisfy the same recurrence relation and agree on the first natural numbers, then 
one proves by induction that they agree everywhere. 

This is rather like the situation with differential equations, where we expect a 
ktt order d.e. and k initial conditions to determine a solution uniquely. However, 
our situation is very much simpler in one way: the existence and uniqueness follows 
immediately from the Principle of Induction, without the need for any hard analysis. 
For any recurrence relation whatever, it is usually obvious just what sort of initial 
values are required to determine the solution uniquely. 


We turn to methods for solving the recurrence relation: 


(4.1.1) Fibonacci Recurrence Relation 
Forn 2 2, 
Fa = Fai + Fh- 


Two methods will be given; both of them generalise. 
FIRST METHOD, Since the recurrence relation is linear, if we can find any solutions, 
we can take linear combinations of them to generate new solutions. (Again this is 
like what happens with differential equations.) Specifically, let F and G satisfy the 
recurrence relation above, and let H, = aF,, + bGn. Then 
H, = aF, + 6G, 

= a(Fa-1 + Fy-2) + 6(Gr-1 + Ga_2) 

= (aFy_1 + 6Gr-1) + (aFn-2 + bG,-2) 

= H, + Ha- 
We try to fit the initial conditions by choice of a and b. 

Try a solution of the form F, = a”. (The justification for this will be that it 
works!) We require 
a = a?! + at 

a” (a? —a@ —1) =0. 

So, if a? — a — 1 = Q, the recurrence holds for all n. 
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The roots of this equation are a = 3(1 + v5), 8 = 4$(1 — v5). So we have a 
general solution of the form 


Rza (: OET (: fy. 


To fit the initial conditions (which are Fo = 1, / = 1 in our case), we require 


a+é=1, 


eas 


whence a + b= 1,a — b= -h giving 
= (4) ,_ (5! 
2v5 OA W5 j) 


(4.1.2) Fibonacci numbers 


PEET SY 


Remarks. 1. (135) æ 1.618..., and (15) œ —0.618... . So the function grows 


exponentially; for large n, its value is the nearest integer to o( A +) (5). 
2. Note that we could easily find values of a and b to fit any given initial values. 
3. We'll see that, for some purposes, the explicit formula is less useful than the 


recurrence relation. 


SECOND METHOD. We now solve the recurrence relation using the technique of 
generating functions. We let ¢(t) be the power series 


p(t) = Z F(ne", 


azo 


where t is an indeterminate. 


We have 
to(t) = Z Faw = > Fin- 1)", 
n>o n>l 
Pet) =o FEY = OO Fin- 2)". 
n>0 n>2 
(Be clear about what is happening here. To get from the second term to the third 
in each equation, we have used the fact that n is only a ‘dummy variable’ whose 
actual name is not important. So, for example, in the first equation, we substitute 
m =n + 1, and then replace the dummy variable m by n. If this confuses you, write 
out the first few terms of both sums.) 
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Now F, = Fu-1 + Faa, 80 it is ‘almost true’ that g(t) = (t + ¢?)¢(t). Certainly, 
the coefficients of ¢? and all higher powers will be the same on both sides of 
this equation, but we might have to adjust the constant term and the term in t. 
Remember that / = 1,4, = 1. 

The coefficient of t is F, on the left and F on the right, so these agree. The 


constant term is F) on the left and 0 on the right, so we have to add 1 to the 
right-hand side to obtain equality. Thus, 


é(t) =14 (t+ #)4(2), 


whence 
1 


0 = Te 
Now the value of F, is the coefficient of t” in the Taylor series for this function. This 
is most easily found by a partial fraction expansion. Let 1 — t — t? = (1—at)(1— ft). 
Thus, a and £ are roots of z? — z — 1 = Q; so a = (244), B= (4). (The same 


as before — no coincidence!) If we let 


1 -2 , > 
G-A pA 1—at + 1— ft’ 


then 
1 = a(1 — ft) + (1 - at), 


soa+b=1, aß + ba = 0. These equations can be solved for a and b (with the same 
solution as before!). 
Now b 
a 
t)= — 
g(t) i-at I- 
=a(l+at tet.. tL HIHPN 


equating coefficients of t”, we find that 
F, = aa” + bp”. 


4.2. Aside on formal power series 


Once we have found the power series in the above argument, we can use the theory 
of power series to show that it converges for |t| < 1/a, and so the manipulations 
above are justified analytically. But in fact there is a theory of formal power series, 
according to which it is legitimate to do such manipulations without any regard to 
questions of convergence. This is important in cases where either the series don’t 
converge for any non-zero value of t, or we are unable to find out enough about 
it to resolve the question of convergence. In this section, TI outline the algebraic 
formalism for this. If you feel comfortable with the arguments of the last section, 
there is no need to read what follows. 
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A formal power series over a field F should be thought of as an expression 


P ont” = a9 + atta? +..., 
n>0 


but more formally it is an infinite se 

a s í quence (ao, &1, a2, ...) of elements of F, (In fact, the definiti 
will work over an arbitrary ring.) The set of all formal power series has nn of addition and 
multiplication defined on it, under which it forms a ring. Also, we can differentiate (and we have a 
ifferential ring). There are additional operations defined only for certain formal power series, such 


as infinite sums and products, and substitution; we will define these informally as required. 


The addition and multiplication are exactly what : i 
term-bycteom’ That i, y what you would expect: you add and multiply 


(= ne) + (= se) = $ (an + ba)t", 


apo n>o n>0 


(= m) - (= ne) = eat” 


n>0 n>0 n>0 


n 
Cn = ` Gibn—i. 


i=0 
It can be checked with some effort that these o 
the distributive law holds. 


For example, we can sum geometric progressions: 


perations are associative and commutative, and that 


n_ l 
dae = poa 


n>o 


This is easily verified by showing that (1 — ct) (Cazolet)”) =1. 


Another very important operation on formal power series is differentiation: 


d 
E (= ser) = P (nant. 


n?i 


The standard rules of elementary calculus for differentiating sums and products hold in this situation 
e standard functions of analysis are defined as formal power series by their usual Taylor series: 


for example, 
i 
exp(ż) = > vt? 
ao” 
_1)2-1 
log(1 + £) = D (pe 

n>1 n 

They satisfy the usual differential equations: 4 exp(t) = exp(t), å log(1 +t) =1/ (1+) 
We can add infinitely many formal power series as long as we are never required to add infinitely 


many fed elements, So, for example, if g(t) = Yaz é,t” is a formal power series whose constant 
zero, then (g(t))" has no term involving powers of t less than z”, Thus it makes sense to 


evaluate 
DX ant)”, 


n>o 
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since a given term, say the term in t”, only contains contributions from expressions an(g(t))” 
for n < m. The resulting formal power series is obtained by substitution of g into f, where 
F(t) = Easo nt”. We see that we can substitute one formal power series into another, provided the 
first has constant term gero. 

Substitution behaves as one would expect: for example, 


exp(log(1 + ¢)) = 1+4, 
log(1 + (exp(¢) — 1)) = ż. 


(Note that log(1 + 4) and exp(t) — 1 do have zero constant term.) Furthermore, if f and g have 
constant term 0, then exp(f) and exp(g) are defined, and 


exp(f) : exp(g) = exp(f + 9). 


One notable example of a formal power series is provided by the Binomial Theorem for a general 
exponent. In our situation, the following statement is a definition, not a theorem: 


(4.2.1) Binomial Theorem 


(+t = >> (e 


n20 


For any real number r, 


{Here the ‘binomial coefficient’ r) is defined by 


> 


(1) = =t, 


n n! 


if r is a positive integer, this agrees with the usual definition, and it vanishes for n > r.) 
Now it can be verified that the ‘law of exponents’ holds: 


(+e -(14eh ETa 
For r = —1, this agrees with our calculation of the sum of a geometric progression above (with 


c = 1). Moreover, we can define ((1+%)")* by substitution, since (1+ £)” has the form 1+ f(t) where 
f has constant term sero; and we find that 


(HYY =G49", 


Finally, we have 
d r rol 
gt) =r(1 +4. 


(This follows from the easily-checked identity n(7) = r arty) 


One more important operation on formal power series is infinite product. Let fi, fo,... be formal 
power series with constant term 0. Then the product 


Tla+ fo 


n>1 


2 Just as ‘Zorn's Lemma’ is an axiom of set theory, and ‘Bertrand's Postulate’ is a theorem. 
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should be defined by taking, in all possible ways, either 1 or f, from the nt’ factor, multiplying these 
together, and adding the resulting terms. To avoid having to multiply infinitely many non-trivial 
terms, we specify that we choose 1 from all but finitely many of the factors; this gives a sum over all 
finite sets of natural numbers. There is still a potential problem; we have to ensure that only finitely 
many terms contribute to the coefficient of any given power of t. This will be true, for example, if 
fa(t) contains no terms of degree less than n in £. So, for example, 


[[a+) 


n>1 


is defined — see Exercise 14. It can be shown that, if [],,,,(1+ fa(t)) = 1 + g(t) is defined, then 


log(1 + g(t)) = > log(1 + fa(t))- 


n>1 


Suppose that F is the field of real or complex numbers. Then, if the sequence (ao, a1,.--} 
grows no faster than exponentially, its generating function will have non-sero radius of convergence,” 
and techniques of analysis can be used on it. However, for many interesting counting functions of 
combinatorial interest, the growth is faster than exponential, and the series must be treated formally. 
For example, the generating function for permutations is $p» nlt”. This diverges for all t # 0, and 
yet the coefficients in its inverse have combinatorial significance (see Exercise 13). 


4.3. Linear recurrence relations with constant coefficients 


The procedure for solving a general linear recurrence relation with constant coeffi- 
cients is similar to that in the Fibonacci case. Consider the recurrence 


P(n) = a F(n— 1) + a2F(n-2)+...+a,F(n — k). 


Using the first method, we try a solution of the form F(n) = a"; we find that a 
must be a root of the polynomial 


z= azt! + azz"? +... + ap- 


Tf this characteristic equation has all its roots distinct, then we obtain k independent 
solutions of the recurrence relation. Taking a linear combination of these, and fitting 
k initial values of F, we get k linear equations in k unknowns; these equations have 
a unique solution. So we have obtained the most general solution of the problem. 
However, if the characteristic polynomial has repeated roots, then we don’t obtain 
enough solutions. In this case, suppose that æ is a root of the characteristic equation 
with multiplicity d. Then it can be verified that the d functions a”,na”,..., nila" 
are all solutions of the recurrence relation. Doing this for every root, we again find 


enough independent solutions that k initial values can be fitted. 
3 Recall from analysis that the radius of convergence of the power series ) „>o @nt” is given by 
R=1/limsup(a,)/". 
t+ oo 


The series converges for |t| < R and diverges for |t| > R. 
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The justification of this is the fact that the solutions claimed can be substituted 
in the recurrence relation and its truth verified. 


EXAMPLE. Solve the recurrence relation 
F(n) = 3F(n — 2) — 2F(n — 3) 
with initial values F(0) = 3, F(1) = 1, F(2) = 8. 
The characteristic equation is 
x? = 32 — 2, 
with solutions z = 1,1, —2. So the general solution of the recurrence relation is 
F(n) = a(—2)" + bn + e. 


To fit the initial conditions, we require a = b = 1,c = 2, so the solution is 
F(n) = (—2)"? +n +2 


4.4. Derangements and involutions 


For linear recurrences with non-constant terms, or for non-linear recurrences, there 
is no general method which always works. Sometimes it is possible to solve such 
relations, either by guessing a solution (and verifying that it works), or by some other 
method. We give a couple of examples. In the first case, we solve the recurrence; in 
the second, we will merely derive some information about the solution. 


EXAMPLE: DERANGEMENTS. A derangement of 1,2,...,n is a permutation of this set 
which leaves no point fixed. In Chapter 1, you were asked to calculate the number 
of derangements for n < 5. Now we will find the general formula. (This will be done 
again in Chapter 5 to illustrate a different technique, the Principle of Inclusion and 
Exclusion.) 


Let d(n) be the number of derangements of {1,...,n}. Any derangement moves 
the point n to some point i < n. Clearly, the same number of derangements is 
obtained for each value of i from 1 to n — 1; so we will find d(n) by computing the 
number of derangements that map n to i and multiplying by n — 1. 

Let x be a derangement with na = i. (Remember that permutations act on the 
right!) There are two cases: 

CASE 1: ix =n. In other words, r interchanges n and 7. Now it operates on the 
remaining n — 2 points as a derangement. Furthermore, given any derangement of 
the points different from i and n, we may extend it to interchange i and n, and 
obtain a derangement of the entire set. So the number of derangements of this type 
is d(n — 2). 

CASE 2: in Æ n; say, jn = n for some j # i. Now define a permutation x’ of 


{1,...,2—1} by the rule 
=f] 
2, 


Then zx’ is a derangement. Any derangement 7’ of {1,...,2—1} can be ‘extended’ to 
a derangement 7 of {1,...,n}, by reversing the construction. So there are d(n — 1) 
derangements under this case. 
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So we obtain 
d(n) = (n — 1)(d(n — 1) + d(n — 2)). 
This is a three-term recurrence relation. The initial values are given by d(0) = 1, 


d(l) = 0. 


(4.4.1) Theorem. The number d(n) of derangements of an n-set is given by 


dtn) = nt ($ SP a). 


i=0 


This is the nearest integer to n! /e for n > 1, where e is the base of natural logarithms. 


Remark. This demonstrates the claim made in Chapter 1, that if n letters are 
tandomly distributed among n addressed envelopes, the probability that no letter 
is correctly addressed is close to 1/e. (The problem asks for the probability that a 
random permutation is a derangement; this is d(n)/n!.) 


To prove the theorem, we must show that the two sides of the equation satisfy 
the same recurrence relation and have the same initial values. So let f(n) = 
n! D2, (-1)i fil. Then 


f(0)=1=40),  f(1) =1 = d(1). 


Also 


(n—1(F(n = 1) + fle) = a1): -E G 


Saien 


+(n-1)-(n- 


= (1): (= 11+ (w= 1): (a -295S SY 
+(e 1) 


n-2 


= nt C ay apt 


(n ST 


since n — 1 = n!/(n — 1)! — n!/nl. 

So the equality is established. 

Now the Taylor series for e`! is YZo(—1})'/i!. Since this series has terms of 
alternating sign and decreasing in absolute value, the difference between the n 
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term and the limit is less than the {n + 1)* term. So 


oo R ST 
(cy 
(e= 


“atl 
< ł for 


<an! 


n>. 


So n!/e differs from the integer d(n) by less than }. It follows that d(n) is the 
nearest integer to n!/e, as claimed. 


EXAMPLE: INVOLUTIONS. Here is an example of a naturally occurring sequence, with a 
simple recurrence relation where we won't find a simple formula either for the terms 
in the sequence itself or for a generating function for them (but see Exercise 18); 
however, we can get quite precise information just using the recurrence relation. 


PROBLEM. How many permutations are there of a set of n elements having the 
property that all their cycles have length 1 or 2? 

The cycles of a permutation refer to its expression as a product of disjoint cycles, 
found in the usual way. For example, s(3) = 4, counting the permutations (1}(2)(3), 
(1 2)(3), (1 3)(2) and (2 3)(1). Similarly, s(2) = 2, and s(1) = 1. (What is s(0)}?) 

Let s(n) be the number of permutations satisfying this condition. As usual, we 


assume that the n-set is {1,2,...,n}, and divide the permutations into two classes: 

è Those which fix the point n. These act on the set {1,...,n— 1} as permutations 
with all cycles of length 1 or 2, so there are s(n — 1) of them. 

o Those which don’t fix n. If such a permutation moves n to i, say, then by 
assumption it contains a cycle (n 7), and it acts on the n— 2 points other than n 
and 7 as a permutation with all cycles of length 1 or 2. There are n — 1 choices 
for i, and for each choice, s(n — 2) choices for the permutation. 

So we have the recurrence relation 


s(n) = s(n — 1) + (n—1)a(n — 2). 


This recurrence relation makes the calculation of further values easy. For 


example, 
3(4) =443-2= 10, 


s(5) = 10+ 4-4 = 26, 
s(6) = 26+5-10 = 76. 


We demonstrate the following properties of the numbers s(n): 


(4.4.2) Proposition. (a) s(n) is even for all n > 1; 
(b) s(n) > Vn! for all n > 1. 


ProoF. Both statements are proved by induction, being easily verified for n = 2,3. 
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(i) If s(n — 1) and s(n — 2) are even, then s(n) = s(n — 1) + (n — 1)s(n — 2) is 
even. So induction applies. 


(ii) Suppose that s(n — 1) > \/(n — 1)! and s(n — 2) > V/(n — 2)!. Then 
s(n) = s(n — 1) + (n — 1)s(n — 2) 
> ¥(n—D! + (n-1)/(n— 2)! 
=y(n-1!+Vn—-1) 


>y¥(n-Di- yn 
= val, 


and the induction goes through. (In (+), we have used the fact that 
l+Vn—-I> yn, 


which is true because (1 + yn — 1)? =n +2yn— 1.) 

REMARKS. 1. The second inequality is actually quite a good estimate. 

2. The evenness of s(n) is a special case of a general group-theoretic fact, in the 
case where G is the symmetric group Sym(n) of all permutations of {1,...,n}: In 
a finite group G of even order n, the number of solutions of x? = 1 is even. This is 


because the elements y for which y? + 1 come in pairs {y,y7?}, and so are even in 
number. 


4.5. Catalan and Bell numbers 


In this section, we look at two important sequences of numbers. They have several, 
apparently accidental, common properties: both are ‘named’; they start out similarly 
(the Catalan numbers are 1, 2, 5, 14, 42, ..., while the Bell numbers are 1, 2, 5, 15, 
52, ...); and both are given by recurrence relations. 

The Catalan numbers appear in many guises throughout combinatorics and 
computer science.* Here is a typical application: 


In how many ways can a sum of n terms be bracketed so that it 
can be calculated by adding two terms at a time? 


For example, if n = 4, there are five possibilities: 


(((a+b)+c)+d), 
((a+ (6+c)) +4), 
(a+ ((b + c) + d)), 
(a + (b+ (c+ d), 
((a+ 6) + (c+ d)). 


4 And elsewhere. Two of my colleagues, independently, asked me about the Catalan numbers which 
had come up in their research. One studies non-linear dynamics; the other, Lie superalgebras. 
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We have ‘normalised’ by enclosing the entire expression in an extra pair of brackets. 
(Note that, in an algebraic system where the operation is non-associative, these 
expressions could all have different values.) 

Let C„ be the number of ways of bracketing a sum of n terms. To obtain a 
recurrence relation for Cn, note that any bracketed expression has the form (£,+ £2), 
where E, and E; are bracketed expressions with (say) ¿ and n — + terms, for some t 
satisfying 1 <i < n — 1. There are C; choices for E;, and C,-; for En-i. Summing 
over i, we obtain our first example of a non-linear recurrence relation: 


(4.5.1) Recurrence relation for Catalan numbers 
Forn > 1, 


n=l 
Cn = E OC: 


i=1 


Let F(t) = Eno Cnt” be the generating function. (By convention, we take 
Cy = 0; also, C, = 1, which is the start of the recurrence.) The recurrence relation 
shows that the terms in ¢? and higher powers of ¢ in F(t)? are equal to those of 
F(t). However, because the constant term is zero, F(t)? has no term in t. Thus, we 
have 


F(t) =t+ F(t)’. 
Re-writing this as a quadratic equation and solving, we obtain 
_il 1/2 
F(t) = 5 (£0 —44) ). 


Because F'(0) = 0, we must choose the minus sign in the solution. Now, from the 
Binomial Theorem, we can read off the coefficient of t”: 


= (2n —2)!/(n — 1)In! 


(In the above expression, there are n+] twos in the denominator, and 4*/2"+? = 27-1, 
Then the product of all odd numbers from 1 to 2n—3 is equal to (2n—2)!/2"""(n—1)!. 
Moreover, there are altogether 2n — 2 minus signs.) Thus: 


(4.5.2) Catalan numbers 
1 f2n—2 
n-1} 


See the Exercises for other combinatorial interpretations of Catalan numbers. 
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We encountered the Bell numbers briefly in Chapter 3. The Bell number B, is 
the number of partitions of a set of size n. We proved there that it satisfies the 


recurrence relation 
t f{n—-1 
Bn = > i-1 Ba-i, 


i=l 
with the convention that Bo = 1. This recurrence is linear, but involves all the 
preceding terms, rather than a fixed number. 

There is no simple closed formula for B,, but there is a nice expression for its 
generating function, which we now derive. This is a type of generating function we 
haven't met before. The exponential generating function, or e.gf., of the sequence 
(ao, a1,.-.) is the formal power series 


The name comes from the fact that the e.g.f. of the all-1 sequence is just the ordinary 
exponential function exp(t), We will see in Part 2 that the exponential generating 
function is well suited to counting labelled objects, in the sense introduced in 
Chapter 2. Note that, if F(t) = Enpo dnt”/n!, then the derivative is 4 Fit) = 
Yao ant 1 /(n — 1)!; this is the eg.f. of the sequence with the first term deleted. 

‘Let F(t) = En>o Bat” /n! be the e.g.f. of the Bell numbers. Take the recurrence 
relation, multiply by ¢*-!/(n — 1)!, and sum over n, to obtain 


(In the penultimate line, we changed dummy variables to j = i— 1 and k = n — t; as 
n runs from 1 to oo, and i from 1 to n, j and k independently take all non-negative 
integer values.) 

Now we have 


£ (exp(— exp(t))F()) = 0, 


so F(t) = cexp(exp(t)) for some constant c. Using the fact that F(0) = 1, we find 
that c = exp(—1); so 


(4.5.3) E.g.f for Bell numbers. 


n 


> Ant = exp(exp(t) — 1). 


n>0 
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4.6. Computing solutions to recurrence relations 


In principle, nothing could be simpler than computing, say, Fibonacci numbers from 
their recurrence relation. By the way it works, knowing that Fp = F, = 1, we find 
F,=1+1 = 2, then F}, and so on. For example, 


Frooo = 70,330, 367, 711, 422, 815, 821, 835, 254, 877, 183, 549, 770, 181, 
269, 836, 358, 732, 742, 604, 905, 087, 154, 537, 118, 196, 933,579, 
742, 249, 494, 562, 611, 733, 487, 750, 449, 241, 765,991, 088, 186, 
363, 265, 450, 223, 647, 106, 012, 053, 374, 121, 273, 867, 339, 111, 
198, 139, 373, 125, 598, 767, 690, 091, 902, 245, 245, 323, 403, 501 


takes just 999 additions to compute. 

However, there is an important point to consider. It is tempting to program the 
calculation exactly as the sequence is defined; that is, to define a function F on the 
natural numbers by the rules 

e F(0) = F(1) = 1; 

e Fin) = F(n-—1)+ F(n-2) forn>1. 
But this is not wise. Let us trace the calculation of F(4). We find that F(4) = 
F(2) + F(3). First, we evaluate F(2) = F(0) + F(1) =141=2. Next, we evaluate 
F(3) = F(1) + F(2). Now the computer does not realise that F(2) has already been 
calculated; it throws away its rough working. So we have to repeat the computation 
F(2) = F(0) + F(1) = 1 +1 = 2 before we can find F(3) = 1+ 2 = 3 and finally 
F(4) = 2+3 = 5. For larger arguments, the amount of repeated labour grows 
exponentially (see Exercise 7). 

So it is important to tell the computer to remember earlier results. For this, 
define an array of numbers (Fo, Fi,..., Fiooo) (if the largest Fibonacci number we'll 
need is Foo), with the first two entries equal to 1, and each subsequent entry equal 
to the sum of the two before it. 

This consideration applies to any sequence of numbers defined by a recurrence 
relation of any sort. In the specific case of Fibonacci numbers, if we only need one 
number F, rather than the whole sequence f,..., Fn, it’s possible to economise on 
storage space. We only need to remember two numbers, say x and y (and a counter 
n). Start with z = y = 1 and z = 1. Now, in a single step, 

e increase n by 1; 
e calculate z + y, and replace either x or y by this number according as n is even 
or odd. 
The last number written (viz, z or y depending on the parity of n) is the n'è 
Fibonacci number. 


It is possible to calculate F, faster than this, using only clogn arithmetic 
operations, using the ‘Russian peasant multiplication’ trick. Exercise 8 gives details. 


5 On the other hand, if we were to use the formula, we would be faced with the need to calculate 


((75 + 1)/2V5)((/5 + 1)/2)!9° to such high accuracy that the final answer is guaranteed to have 
an error of less than 0.5 — a much more difficult task! 
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4.7. Project: Finite fields and QUICKSORT 


In this section, we will work through two more elaborate applications of re- 
currence relations and generating functions. We prove the existence of irreducible 
polynomials over finite fields; and we calculate the average number of comparisons 
needed to sort a list using QUICKSORT.® 


IRREDUCIBLE POLYNOMIALS AND FINITE FIELDS. 
If p is a prime, the integers modulo p form a field: addition, subtraction, multiplication and division 
(except by zero) are defined, and the commutative, associative and distributive laws hold. What other 


finite fields exist? 
This question was answered by Galois in the nineteenth century.’ He proved the following 


result: 


(4.7.1) Galois’ Theorem 
The number of elements in a finite field is a prime power; and, for any prime 
power q, there is a unique field with q elements. 


The field with q elements is called the Galois field of order q, denoted by GF(q). Thus, if p is 
prime, then GF(p) = Z/(p), the integers mod p. Suppose that g = p”. Then a field of order q is 
constructed from a polynomial 


f(z) = 2" + bye") +... +12 + b9 


over Z/(p), which has degree n, is monic (leading coefficient 1), and is irreducible: the elements of 


the field are the p° expressions 
eo teyat...t ena! 


for ¢o,€1,.-.;6n-1 E Z/(p); addition and multiplication are defined in the obvious way, but setting 
f(a) = 0 where necessary to reduce the degree of any expression to n — 1 or Jess. (Compare the 
construction of the complex numbers as the set of objects of the form a + bi for a,b € R, where 
i? = —1; note that the polynomial x? + 1 is irreducible over R.) 


The point of this brief discussion is that the existence of finite fields will follow if we can show 
that there is an irreducible polynomial of any possible degree over Z/(p). We will prove this in 
the most naive way possible, by counting the polynomials. We need one algebraic fact: a monic 
polynomial over a field can be factorised into monic irreducible factors, uniquely up to the order of the 


factors. 

Fix a prime power q, and let F be a field of order g. (For Galois’ Theorem, take g = p prime, 
and F = Z/(p).) Let an be the number of monic irreducible polynomials of degree n over F, The 
total number of monic polynomials of degree n is g”, since each of the n coefficients b,_1,..., 61, bo 
can be chosen arbitrarily from F. 


6 I am indebted to Colin McDiarmid for the second example. 

7 Bvariste Galois was killed in a duel at the age of 20. The night before the duel, he had written 
all his recent mathematical discoveries in a hastily scrawled letter to a friend; this document can be 
regarded as the foundation of modern algebra, though its influence was not felt until its publication 
by Liouville fifteen years later. The theorem on finite fields, however, is one of the few pieces of his 
work published during his lifetime. 
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Now an arbitrary polynomial has a unique factorisation into irreducibles, Consider those 
polynomials which have m; factors of degree 1, mz of degree 2, and so on. We must have 


mı + 2m + ... = n. The m; factors of degree i are chosen from the set of a; irreducibles of degree 
atmi) 


i; repetition is allowed, and the order of the factors is not important. By (3.7.1), there are ( 
choices for these factors, and hence 
Il a; +m: — D 
Mi; 


ipl 


polynomials with a factorisation of this shape. So, counting all monic polynomials of degree n, we 


have + i 
atm—-l\ a 
II ( m; ) =g". (+) 


miteémet..=n i21 


This is a recurrence relation (albeit a highly complicated, non-linear one). We illustrate the case 
g= 2. 


a3 + 142+ ( 


1 1 3 
aa tara + ("7 J(e Jat ("] ) =16, 


from which we obtain successively a; = 2, az = 1, ag = 2, a4 = 3. 


The point of this section is that, by sleight-of-hand with generating functions, we transform this 
recurrence relation into a very much simpler one, from which (for example) the fact that an > 0 can 
be seen directly. 


Multiply equation («) by ¢” and sum over n: 


n>o 


-5r £ II (* +m - D 


n>0 mi+p2m3tł...=n i>0 
Se 
< Mi 
Mi Ma, i>0 


(The last step needs a little explanation. If we sum over all n and then over all choices of m, m2, ... 
satisfying mı + 2m2+...= n, we have simply summed over all sequences (m1, m2, .. .) with only 
finitely many non-zero terms; this is what is meant by the prime on the summation sign. Furthermore, 


n im, 
P= I , 
so the power of t can be split up as claimed.) 
Now the main technical step: I claim that the above expression is equal to 
I > (* +m- Nem, 
i>im20 m 


This is because, to evaluate an infinite product of this form, we choose one term from each factor 
in all possible ways so that all but finitely many choices are equal to 1, multiply the chosen terms, 
and add the results; say we choose the mt} term from the iP factor, where finitely many m; are 
non-sero. This gives just the sum previously described. 
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DEG 


so, by the Binomial Theorem, we have 


m20 


Now note that 


So we have ; 
a- =] ja- eye. 
i21 
Now comes the trick. We take logarithms of both sides: 


—log(1 — gt) = — $ a; log(1 — #"), 
i21 


whence ; 
t k 


(Cia h 
yt erat 
n21 i>) k>1 
Now we equate the coefficients of ¿° on both sides. On the right, we obtain a term for each pair 
(i, k) with ik = n; in other words, for each divisor é of n. 


Multiplying by n gives 


(=) 


This is our desired recurrence relation. It is linear, and has many fewer terms than (+), To re-do the 
case q = 2: 
a = 2, 
a1 + 203 = 4, 
a + 3e3 = 8, 
a, + 2a2 + 4a, = 16. 

In Chapter 12, we will discuss Mébius inversion, and solve this recurrence relation explicitly. But, 
in the meantime, observe that g” is the sum of at most n terms, of which all except na, are at most 
q"/? (since they occur in earlier recurrence relations). In general, g* > (n — 1)9"/?; a0 an > 0. Thus, 
there exists an irreducible polynomial of any degree over any finite field. 

With a little more algebra, the recurrence relation (x+) can be used to show the uniqueness in 
Galois’ Theorem as well. (In outline: one shows that any element of a field of order g” satisfies an 
irreducible polynomial over the subfield of order q whose degree divides n. Now the a; irreducible 
polynomials of degree 7 have at most ia; roots; and (««) shows that these roots are just sufficient in 
number to comprise one field of order g*.) 

It is instructive to compare the very different proof of Galois’ Theorem normally given in algebra 
text-books. It is possible to use that proof, and the counting of roots as in the preceding paragraph, 
to give another proof of (#+). 


THE PERFORMANCE OF QUICKSORT. 


A great deal of computer time ia spent in sorting lists — arranging the elements in order, if they 
were originally arranged haphazardly. It is important to be able to do this efficiently, and to estimate 
how complex a task it is. ` 

Many important algorithms are recursive: they solve a given problem instance by reducing it to 
smaller instances of the same problem. Thus, the average (or longest) time taken to solve a problem 
of size n can be expressed in terma of the time for smaller problems, giving rise to a recurrence 
relation. As an example, I will calculate the average number of comparisons taken by Hoare’s 
QUICKSORT algorithm to sort a randomly ordered list of n items. 
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The algorithm is defined as follows. 


(4.7.2) QUICKSORT 
to sort a list L 
Let a be the first item of the list. 
e Partition the remainder of the list into sublists L~, L+ consisting of the 
elements Jess than, greater than a respectively. 
o Sort L” and Lt, 
e Return (L7 (sorted), a, L+ (sorted)). 


We will calculate the average number of comparisons of individual elements which have to be 
made, assuming that the algorithm is presented with a list in random order (that is, all orderings 
equally likely). But first, what answer do we expect? There are n! possible orderings; since each 
comparison can at best narrow down the number of possibilities to half the previous value (on 
average), we would expect to need at least log, n! comparisons.* By (3.6.1), 


log, n! = nlogn/log2+ O(n) = 1.4427 ...nlogn + O(n). 


We will show that the average number of comparisons required by QUICKSORT is only a constant 
factor worse than this lower bound, namely 2n logn + O(n). 
The crucial observation is that, if the list L is in random order, then 
the first element a is equally likely to be the first, second, ..., nt" smallest element; 
e the sublista L- and Lt are randomly ordered (i.e. all orderings equally likely). 
Let gn be the average number of comparisons required to sort a list of length n. Thus we have 


1 n 
I= n—1+ nalts + Gn—k): 


(The first step requires n — 1 comparisons; if a is the &® smallest element, the second step requires 
an average of ¢;-1-+¢n—% comparisons, and this number has to be averaged over the possible values 
of k.) We can simplify this to 


na-1 


2 
dann lt Doe 


since each of go,.--,9n-—-1 occurs twice in the sum. 


The initial value is clearly go = 0. 
To solve this recurrence relation, we find a differential equation for its generating function. Let 


Q(t) = > gnt” 
n>0 
be the generating function. Multiplying the recurrence relation by nt” and summing gives 
n- 
Tant = Fae- +2 p «| r, 
n>o ape n20 \i=0 
We analyse the three terms. The second is just the Taylor series of 227/(1 — t). (We have 


> n(n — 1)? = 2/0- t}, 


nzo 


8 This might be called the ‘Twenty Questions’ principle. For a proof, see Exercise 23. 
9 The notation O(f(n)) means ‘a function whose absolute value is bounded by cf(n) for some 
constant c, as n — co’, 
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most easily by differentiating twice the series for 1/(1—1), or alternatively by the Binomial Theorem.) 
The first term is tQ'(ż), since 
OW) = > ngri”! 


a20 


The last term is the most difficult; I claim that it is 2#@(t)/(1 — t). This is because 
(t(/1 — dQ) = +PP +...) (go + art + 920? + gat? +...), 


and the ¢* term is obtained by multiplying i"~‘ from the first factor and q;¢' from the second, and 
summing over i. i 
Thus, we have 
24? 2t 
1Q'() = a + — 
Q"( ) (1-28 + Taye). 


, This is a first-order linear differential equation, for which there is a standard method for solution. 
Without going through the general case, we have 


0-9 =a- 9° - 20. ~NEW = Gay. 
(1 - t} Q(U) = -2(¢ + log(1 — £)) 
(using the fact that Q(0) = 0). Hence 


—2(¢ + log(1—#)) 


a) = TE 


W. £ still seems a tall order to find the coefficients in this power series explicitly; but it can be done. 
e have 


2 3 
a= (+5 te) CA++., 


e25 (7) 140 
= 2041) (3) —An, 


This is an exact formula, though it involves a aum of n terms. We can produce an approximation by 


using the fact that!® 
1 
> G) = logn + O(1), 
i=1 ? 
whence 
gn = 2n logn + O(n), 


as we promised. 


10 The sum is an approximation to the area under the curve y = 1/z from z = 1 toz =n. 


4.8. Exercises 


FIRoNnacci NUMBERS, In these exercises, Fn denotes the n Fibonacci numbe . 
F. 


1. , a 
L o Enere are n seating positions arranged in a line. Prove that the number of ways 
ubset of thes iti i iti 
ne positions, with no two chosen positions consecutive, is 
(b) If the n positions are i 
choiewsis RR hee ranged around a circle, show that the number of 
2. Prove the following identities: 
(a) Fa - Pau Faei = (-1)" for n > 1. 
(b) LRA = Fate — l. 
i=0 
2 
(c) Fr. + Fe = Fin, Fn- Fa + Fp Fas. = Fant. 


Inf] 74, 
@ R= 3 (" '. 
i=0 t 
3. Show that F, is composite for all odd n > 3. 
4. Show that 
Un—1)/2] 
Fai = Fay ~1 
t=0 
forn > 1. 
5. Prove that every non-negative i 
, - ve inte 
vague wap cery no gative integer x less than F,4, can be expressed in a 
Fit Fat...4k 
where 2, ,72,...,% € {1 n}, z j 3 a4 e) 
p bayer te >e” fy ty > t2 +1, t2 > i3 +1, ... (in oth 3 i 
are (eines Be and no two are consecutive). Deduce Bea 1a). words insoni 
T: By Exercise 4, the largest expressi ' 
ping Fibonacci numbers below F, is Pii l. So. or conn f 9 mt nE be made 
included in the sum; and z — F, < Fn-1, 80 Fh- cannot be included) n must be 


6. Fib i a: 
Ase ee numbers are traditionally associated with the breeding of rabbits.) 
a pair of ha a pair of rap dite does not breed in its first month, and that it produces 
pring in each subsequent month. A uoj 
Show that, starting wi nth. Assume also that rabbits live foreve 
g with one newb , " I. 
the n** month is F.. ewborn pair of rabbits, the number of pairs alive in 


T. Prove that the number of additi 
l itions required to c i i 

Fa according to the ‘inefficient? algorithm described in the feat i ee number 
8. (a) Prove that F4, = Fap F, "son 

POY +n mEn + Fm—iF,-1 for m,n > 0 (with the convention that 

(b) Use this to derive an algori 
l gorithm for calculating F, usin i 
i à g only clo t i 

operations. [HINT: see Russian peasant multiplication (Exercise i2 of OaD] 


(c) Given that multiplication 18 slower than addit; on, 18 this o! it wL y bette 
À n, alg p h reall T 


r 


11 m: : 
This example is due to Fibonacci (Leonardo of Pisa) himeelf. 
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MISCELLANEOUS RECURRENCES AND GENERATING FUNCTIONS. 


9. (a) Solve the following recurrence relations. 
(i) f(n +1) = f(r)’, FO) = 2. 
(i) f(n +1) = fin) + Fin- 1) + fin- 2) £0) = FQ) = FQ) = 1 


n-li 
(ii) f(n +1) =14 D7 fë} FO) =1. 
i=0 
(b) Show that the number of ways of writing n as a sum of positive integers, 
where the order of the summands is significant, is 207-1 forn > 1. 
10. The number f(n) of steps required to solve the ‘Chinese rings puzzle’ with n 
rings satisfies f(1) = 1 and 


fin +1) = (Ft neven 


Prove that f(n + 2) = f(n +1) + 2f(n) + 1. Hence or otherwise find a formula for 
f(n).? 

11. (a) Let s(n) be the number of sequences (1,.-.,2%) of integers satisfying 
1 < z; <n for all i and tip > 2z; fort =1,... ,& — 1. (The length of the sequence 
is not specified; in particular, the empty sequence is included.) Prove the recurrence 


s(n) = s(n — 1) + s([n/2}) 


for n > 1, with s(0) = 1. Calculate a few values of s. Show that the generating 

function S(t) satisfies (1 — t)S(¢) = (1 + t)S(2). 
(b) Let u(n) be the number of sequences (z1,...,%%) of integers satisfying 

1 < z; <n for all i and 24; > Die z; fori =1,...,k—1. Calculate a few values 

of u. Can you discover a relationship between s and u? Can you prove it? 

currence 


12, Let F(t) be a formal power series with constant term 1. By finding a re 
relation for its coefficients, show that there is a multiplicative inverse G(t) of F(t). 
Moreover, if the coefficients of F' are integers, so are those of G. 

. n} is called connected if there does not exist 
a number k with 1 < k < n such that r maps the subset {1,2,...,&} into itself. Let 
c, be the number of connected permutations. Prove that 


13. A permutation 7 of the set {1,.. 


n 


Sa(n—2)! =n! 


i=1 


Yonp1 En are the generating functions of 


Deduce that, if F(t) = Dap n! and G(t} = 
= (1 + F(t))-}. (Note that 


the sequences (n!) and (cn) respectively, then 1 — Git) 
F(t) and G(¢) diverge for all t # 0.) 


were given in 1872 in Théorie du Baguenodier, by 


12 The formula, and an algorithm for solution, 
). See S. N. Afriat, The Ring of Linked 


‘Un Clerc de Notaire Lyonnais’ (now identified as Louis Gros 
Rings (1982). 
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IHG +2) = 54.2. 

n> n>o 
Prove that a, is the number of ways of writing » as the sum of distinct positive 
integers. (For example, ag = 4, since 6 =5+1=44+2=3+42+41,) 


15. (a) In an election, there are two candidates, A and B; the number of votes 
cast is 2n. Each candidate receives exactly n votes; but, at every intermediate point 
during the count, A has received more votes than B. Show that the number of ways 
this can happen is the Catalan number C,,. (Hint: A leads by just one vote after 
the first vote is counted. Suppose that this next occurs after 2; + 1 votes have been 
counted. Then there are f(t) choices for the count between these points, and f(n —i) 
choices for the rest of the count, where f(n) is the required number; so we obtain 
the Catalan recurrence.] 

HARDER PROBLEM. Can you construct a bijection between the bracketed expressions 
and the voting patterns in (a)? 


(b) In the above election, assume only that, at any intermediate stage, A has 
received at least as many votes as B. Prove that the number of possibilities is now 
Ca+1- [HINT: Give A an extra vote at the beginning of the count, and B an extra 
vote at the end] 


16. A clown stands on the edge of a swimming pool, holding a bag containing n 

red and n blue balls. He draws the balls out one at a time and discards them. If 

he draws a blue ball, he takes one step back; if a red ball, one step forward. (All 

ee mi the same size.) Show that the probability that the clown remains dry is 
n+). 


17. Prove that 


B jn 


[HINT: See the footnote on p. 56.) 


18. Prove that the exponential generating function for the numbers s(n) of Section 4.4 
is exp(t + 41°). 


19. The Bernoulli numbers b, (not to be confused with the Bell numbers!) are defined 


by the recurrence bọ = 1 and 
z (nt ') 
> hb =0 
("2 
for n > 1. Prove that the exponential generating function 


A= AF 


n>0 


is given by f(t) = ¢/(exp(t) — 1). 
shew that f(t) + it is an even function of t, and deduce that 6, = 0 for all odd 
no. 
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REMARK. The Bernoulli numbers play an important and unexpected role in topics 
as diverse as numerical analysis, Fermat’s last theorem and p-adic integration. 


What is the solution of the similar-looking recurrence b} = 1 and 


Ee- 


k=0 


forn > 1? 


20. For even n, let ¢, be the number of permutations of {1,... ,n} with all cycles 
even; on, the number of permutations with all cycles odd; and p, = n! the total 
number of permutations. Let E(t), O(t) and P(t) be the exponential generating 
functions of these sequences. Show that 

(a) P(t) = (1-2); 

(b) E(t) = (1 —#?)-1?; (Hint: Exercise 15 of Chapter 3] 

(o) E(e).0(8) = P(e); 

(d} en = on for all even n. 

[I don’t know any ‘bijective’ proof of the last equality.] 


QUESTIONS oN QUICKSORT AND BINARY TREES. 


21. Show that QUICKSORT sometimes requires all (3) comparisons to sort a list. For how many 
orderings does this occur? One such ordering is the case when the list is already sorted — is this a 
serious defect of QUICKSORT? 


22. Let mn be the minimum number of comparisons required by QUICKSORT to sort a list of length 
n. Prove that, for each integer k > 1, mn is a linear function of n on the interval from 2*-1_ 1 to 
2 — 1, with 

mgri = (k — 2)2 +2. 


If n = 2* — 1, what can you say about the number of orderings requiring mm, comparisons? 


23. This exercise justifies the ‘Twenty Questions’ principle. We are given N objects and required 
to distinguish them by asking questions, each of which has two possible answers. The aim if this 
exercise is to show that, no matter what scheme of questioning is adopted, on average the number 
of questions required is at least logy N. (For some schemes, the average may be much larger. If we 
ask ‘Is it a1?’, ‘Is it a2?’, etc., then on average (N + 1)/2 questions are needed!) 

A binary tree is a graph (see Chapter 2) with the following properties: 

o there is a vertex (the root) lying on just two edges; 

o every other vertex lies on one or three edges (and is called a leaf or an internal vertex accordingly); 

e there are no circuits (closed paths of distinct vertices), and every vertex can be reached by a 
path from the root. 

It is convenient to arrange the vertices of the tree on successive levels, with the root on level 0. 
Then any non-leaf is joined to two successors on the next level, and every vertex except the root has 
one predecessor. The height of a vertex is the number of the level on which it lies. 

In our situation, a vertex is any set of objects which can be distinguished by some sequence of 
questions, The root corresponda to the whole set (before any questions are asked), and leaves are 
singleton sets, The two successors of a vertex are the sets distinguished by the two possible answers 
to the next question. The height of a leaf is the number of questions required to identify that object 
uniquely. 

Srep 1. Show that there are two leaves of maximal height (A, say} with the same predecessor. 
Deduce that, if there is a leaf of height less than h — 1, we can find another binary tree with N leaves 
having smaller average height. Hence conclude that, in a tree with minimum average height, every 
leaf has height m or m + 1, for some m. 
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STEP 2. Since there are no leaves at height less than m, there are altogether 2” vertices on level 


m. 
STEP 3. If there are p internal vertices on level m, show that there are 2p leaves of height m + 1, 
and N — 2p = 2™ — p of height m; so N = 2™ + p, where 0 < p < 2”. 
STEP 4. Prove that log (2™ +p) < m + 2p/(2” + p), and deduce that the average height of leaves 
is at least log. N. 


REMARK. Sorting a list is equivalent to finding the permutation which takes the given order of the 
list into the ‘correct’ order; thus it involves identifying one of n! possibilities. So any sorting method 
which compares elements of the list will require, on average, at least log, n! = n log n/ log 2+ O(n) 
comparisons, as claimed in the text. Figure 4.1 shows the binary tree for QUICKSORT with n = 3. 


Fig. 4.1. Binary tree for QUICKSORT 
(Left = yes, Right = no) 


24. Suppose that the two successors of each non-leaf node in a binary tree are distinguished as eft 
and ‘right’. Show that, with this convention, the number of binary trees with n leaves is the Catalan 
number Cy. [Hint: Removing the root gives two binary trees, a ‘left’ and a ‘right’ tree. Use this to 
verify the recurrence relation.] 


5. The Principle of Inclusion and 
Exclusion 


To every thing there is a season, and a time to every purpose under the 
heaven: 

A time to be born, and a time to die; a time to plant, and a time to pluck up 
that which is planted; 

A time to kill, and a time to heal; a time to break down, and a time to build 


up; 
A time to weep, and a time to laugh; a time to mourn, and a time to dance; 
A time to cast away stones, and a time to gather stones together; a time to 
embrace, and a time to refrain from embracing; 

A time to get, and a time to lose; a time to keep, and a time to cast away; 
Atime to rend, and a time to sew; a time to keep silence, and a time to speak; 
A time to love, and a time to hate; a time of war, and a time of peace. 


Ecclesiastes, Chapter 3 


ToPics: Principle of Inclusion and Exclusion; Stirling numbers; 
even and odd permutations 


TECHNIQUES: Generating function tricks; matrix inverses 
ALGORITHMS: 


CROSS-REFERENCES: set-partitions, cycles of permutations, inverse 
of Pascal’s triangle (Chapter 3); derangements, exponential gener- 
ating function, [Bernoulli numbers] (Chapter 4); Mobius inversion 
(Chapter 12) 


Suppose we are given a family of sets, and told the number of elements which 
lie simultaneously in every set of each possible subfamily. Then we have enough 
information to work out how many elements lie in none of the sets, or indeed, how 
many lie in each region of the Venn diagram of the family. The Principle of Inclusion 
and Exclusion, known as PIE for short, is a formula for calculating this. It gives 
rise to another proof of the theorem about inverting Pascal’s triangle, as well as a 
formula for the number of partitions of an n-set into k parts. This last number is a 
so-called Stirling number of the second kind. We spend the second half of the chapter 
investigating these numbers and their relatives, and their surprising properties. 


5.1. PIE 


In a class of 100 pupils, a survey establishes that 45 play cricket, 53 play tiddlywinks, 
and 55 play Space Invaders. Furthermore, 28 play cricket and tiddlywinks; 32 play 
cricket and Space Invaders; 35 play tiddlywinks and Space Invaders; and 20 play 
all three sports. How many pupils don’t play any sport? 
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This problem can be answered by drawing a Venn diagram to represent the 
three sets. Then the numbers in each region can be worked out in turn, until finally 
the number in none of the regions is found. For example, 8 pupils play cricket and 
tiddlywinks but not Space Invaders. 


Fig. 5.1. A Venn diagram 


The Principle of Inclusion and Exclusion gives a formula for this calculation, not 
relying on our ability to draw meaningful Venn diagrams with arbitrarily many sets. 
First, some notation. Let X be our ‘universe’ (corresponding to the whole class 
in the example), and let (Ai, A2,...,An) be a family of subsets of X. (It is not 
forbidden that some set occurs more than once in the sequence.) If J is a subset of 
the index set {1,...,n}, we set 
Ar=() As 


ie! 
with the convention that Ag = X. (Intersecting more sets gives a smaller result; so 
intersecting no sets at all should give the largest possible set.) 


(5.1.1) Principle of Inclusion and Exclusion 
Let (Ai,...,An) be a family of subsets of X. Then the number of 
elements of X which lie in none of the subsets A; is 


Z pyl. 


IC{1,....7} 


Proor. The sum on the right is a linear combination of cardinalities of sets A; with 
coefficients +1 or —1. We calculate, for each point of X, its ‘contribution’ to the 
sum, that is, the sum of the coefficients of the sets A; which contain it. 

Suppose first that r € X lies in none of the sets Aj. Then the only term in the 
sum to which z contributes is that with IJ = 0; and its contribution is 1. 

Otherwise, the set J = {i € {1,... n} : © € A;} is non-empty; and x € Ay 
precisely when I C J. Thus, the contribution of x is 


eS iNe 
Eo SE) 1) 
=(1-1ř=0 
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by the Binomial Theorem, where j = |J]. 

Thus, points lying in no set A; contribute 1 to the sum, while points in some A; 
contribute 0; so the overall sum is the number of points lying in none of the sets, as 
claimed. 

PIE has a natural interpretation for small n. For n = 2, we take the number 
of points in X, and subtract the sum of the numbers in A; and Az; the points in 
A, N Az have been subtracted twice, and must be added in again.’ For n = 3, after 
the pairwise intersections have been added, we find that the points lying in all three 
sets have been included once too often, and must be removed again. 


We proceed to a couple of applications of PIE. 


(5.1.2) Corollary. The number of surjective mappings from an n-set to a k-set is 
given by 
k fk 
ve ()e-a. 
i=0 t 
In particular, we have 
nr 
al = S(-1 (") (n — i)”. 
i=0 2 
Proor, We take X to be the set of all mappings from {1,...,n} to {1,..., k}, so 
that |X| = k”. For i = 1,...,%, we let A; be the set of mappings f for which the 
point i does not lie in the range of f. Then each f(z) can be any of the k — 1 points 
different from 1, and so |A;| = (k — 1)". More generally, A; consists of all mappings 
whose range contains no point of J, and |A,| = (k — |Z])". 
A mapping is a surjection if and only if it lies in none of the sets A;. So, by PIE, 
the number of surjections is equal to 


D (y = |Z)". 


IC (1,..0k} 
Put i = |F|. There are (*) sets I of cardinality i, where i runs from 1 to k; this gives 
the result, 
If k = n, then the permutations of {1,...,n} are precisely the surjective map- 
pings from this set to itself. 
For a second application, we give a second proof of the formula for the number 
of derangements. 


(5.1.3) Theorem. The number of derangements of {1,...,n} is equal to 


Proor. This time, we take X to be the set of permutations, and A; the set of 
permutations fixing the point i; so |A,| = (n — 1)!, and more generally, |Az| = 


t This gives the familiar identity |A; U A2| + |A1 N Aa| = |Ai| + [Aa] (see Section 2.7}. 
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(n — |Z|)!, since permutations in A; fix every point in I and permute the remaining 
points arbitrarily. A permutation is a derangement if and only if it lies in none of 
the sets A;; so the number of derangements is 


E Da-i- Dey?) 9 


IC{l,....2} i=0 


on putting ¢ = |I|. The result follows on noting that (7) (n — a)! = nl fil. 


5.2. A generalisation 


In the introductory example, it is clear that there is enough information to find, 
not only the number of pupils who play none of the sports, but (for example) the 
number who play cricket only. This can be formulated in general, as we will do in 
this section. As a consequence, we give a different proof of Exercise 23 of Chapter 3, 
about the inverse of the matrix of binomial coefficients. 


(5.2.1) Proposition. Let (4,,...,A,) be a family of sets, and I a subset of the index 
set {1,...,n}. Then the number of elements which belong to A, for all i € I and 
for no other values ts 

LEMA]. 


JI 


ProoF. We define a new family of sets indexed by N \ I, where N = {1,...,n}, by 
setting Bk = Aruys} for k € N \ £. The Proposition asks us to calculate the number 
of elements of Ay lying in none of the sets B,. By PIE, this number is 


> (-1)"""|BxI, 
KCN 
where By = Ar. Now the correspondence K «+ J = IU K between subsets of N \ I 
and subsets of N containing J is a bijection; and Bp = A; if K and J correspond, 
So the result is true. 


Next, we turn this result into more abstract form, referring to arbitrary set 
functions rather than cardinalities of sets. 


(5.2.2) Proposition. Let N = {1,...,n}, and let f and g be functions from P(N) to 
the rational (or real) numbers. Then the following are equivalent: 

(a) (D) = Esa: f(J); 

(b) f(Z) = Daze(-1)Mlg(J). 


PROOF. We argue that it suffices to prove the result when the values of f are 
non-negative integers. For either of (a) and (b) can be regarded as a system of 
2" linear equations in 2” unknowns (the values of f or g); this means that the 
corresponding homogeneous system has only the zero solution in integers. But any 
tational solution would give rise to an integer solution, on multiplying the solution 
by a suitable integer (the least common multiple of the denominators). So the linear 
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equations have a unique rational solution. This means that the determinant of the 
coefficients is non-zero, and this fact doesn’t change on passing from the rationals 
to the reals. 

But now, given any non-negative integer values of the function f, we can 
construct a family (41,...,A,) of sets with the property that the number of points 
lying in A; for i € I but for no other values of A; is exactly f(T). (Imagine a Venn 
diagram for n sets; put f(T) elements in the region corresponding to this condition.) 
Then g(I) = Duo; f(F) is the total number of elements in Az; and the result follows 
from (5.2.1). 


The same result with the set inclusions reversed is also true: 


(5.2.3) Proposition. Let N = {1,...,n}, and let f and g be functions for P(N) to 
the rational (or real) numbers. Then the following are equivalent: 


(a) gD) = Dsc fF); ; 
(b) f()= Daer(-1)!" Ig(J). 

To see this, we define new set functions f’ and g’ by the rules that f’(Z) = f(N\J) 
and g'(T) = g(N \ J), and apply (5.2.2) to these functions. If I’ and J’ denote N \Z 
and N \ J respectively, condition (a) becomes 


sU) =g (= X P) =} fU). 


JF 
Similarly, condition (b) translates correctly, because |J’ \ Z'| = H \ JI. 


(5.2.4) Corollary. Let f and g be real-valued functions on {0,...,n}. Then the 
following are equivalent: 


=X L); 


is 
. iif iN 
e 10 =E= (i)o 
ist J 

Proof. We define set functions F and G on P(N) by letting F(I) = f(z) and 
G(I) = g(t) whenever |I| = i. Now, if |F| = i, then F has () subsets of size j, and 
the result follows immediately from (5.2.3). 

This result gives an alternative proof of the result of Chapter 3, Exercise 23, 
about inverting Pascal’s Triangle. We repeat the result for reference. 


(5.2.5) Theorem. Let n be given, and let A and B be the (n + 1) x (n + 1) matrices 
(with rows and columns indexed from 0 to n) having (i, 7) entries 


wal) wae) 
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Proor. Let V be the real vector space of functions from {0,...,7} to R; each vector 
f is represented by the (n + 1)-tuple (f(0),...,f (n)). Then the matrices A and B 
represent linear transformations of V mapping the function f to the function g and 
back again (in the notation of (5.2.4)); so one is the inverse of the other. 


§.3. Stirling numbers 


In this section, we look at two 2-parameter families of numbers. They are related to 
the factorials and Bell numbers in much the same way that the binomial coefficients 
are related to the powers of 2. (In a sense, they complete the pattern ‘subsets, 
permutations, partitions’ of Chapter 3.) The reasons for discussing them here are a 
bit tenuous: their surprising relationship to each other ((5.3.4) below) parallels that 
of the binomial coefficients to their signed versions, proved using PIE in the last 
section; and there is a formula for the Stirling numbers of the second kind, which 
is an application of PIE, from which some of their most important properties are 
derived. 


Let n and k be positive integers with k < n. 

The Stirling number of the first kind, s(n, k), is defined by the tule that 
(—1)""*s(n,k) is the number of permutations of {1,... nm} with k cycles. (Note 
the sign. Sometimes a different convention is used, according to which the Stirling 
numbers are the absolute values of those defined here.) 

The Stirling number of the second kind, S(n,k), is the number of partitions of 
{1,...,7} with k (non-empty) parts. 

The definitions can be extended to all n and k by defining the Stirling numbers 
to be 0 unless 1 < k <n. 


(5.3.1) Proposition. (a) 52(—-1)"-#s(n, k) = È Is(n,k)| = nl; 
k=1 k=1 


(b) > S(n,k) = Ba, where Bn is the n'® Bell number. 
k=1 


This is clear from the definition. 


Both arrays satisfy recurrence relations, similar to that for Pascal’s triangle. 
Recall that s(n,0) = S(n,0) =0 for all n. 


(5.3.2) Proposition. (a) s(n, n) = S(n,n) = 1; 
(b) s(n +1,k) = —ns(n,k) + s(n, k - 1); 
(c) S(n+1,k) = kS(n,k) + S(n,k — 1). 


Proor. (a) is clear; the proofs of (b) and (c) are similar. Consider first partitions of 
{1,... 7 +1} with k parts. Either n +1 is a singleton part (in which case {1,...,n} 
is partitioned into k — 1 parts), or n+ 1 is adjoined to one of the k parts into which 
{1,...,n} is partitioned. 

The case of permutations requires a little more care. Given a cycle of length I, 
there are l places at which a new point can be interpolated, giving / different cycles. 
So, given a permutation of {1,...,n} with k cycles, there are n ways of interpolating 
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the point n+ 1 so as to have k cycles resulting (since the cycle lengths sum to n). In 
addition, we could add the one-point cycle (n + 1) to a permutation of {1,...,7} 
with k — 1 cycles. Thus 


[s(n +1, k)| = nls(n, k)| + [s(n, k — 1), 


and on putting the signs in correctly we obtain the result. 


Using this recurrence, we prove a remarkable ‘generating function’ form. Recall 
than (t), = t(t— 1)... (t-n +1). 


(5.3.3) Proposition. (a) (t)n = > a(n, k)t*; 


k=1 


(b) = > S(n, kt) 
k=1, 


Proor. The proofs are by induction on n. Since t! = (t), = ¢ and s(1,1) = S(1,1) = 
1, the inductions begin at n = 1. 


PROOF oF (a). Assume that (t), = Et s(n, k)t*. Then we have 


Wari = Walt —n) = (Et) e-n), 
k=1 
and the coefficient of t* on the right is —ns(n,k) + s(n, k — 1) = s(n +1,k). 
Proor oF (b). Assume that t” = Ep- S(n,k)(¢),. Then 


it r at = Dita (CE — k) + k) Sf k). 


=1 


Since (t),(¢ — k) = (t)e41, we have 


PH _ E S(n,k) (en + S(O 
k=1 k=l 


n+l 
= X (S(n,k — 1) + ES(n, k)) He 
n+l 


= J S(n+1,k)(t)e, 
since S(n,0) = S(n,n+ 1) =0. 


There are direct combinatorial proofs of this result. Such a proof for (b) is 
outlined in Exercise 4; but the argument for (a) involves the concept of group action 
and the Orbit-Counting Lemma, and is deferred until Part 2. 


(5.3.4) Corollary. Let A and B be the n x n matrices whose (i,j) entries are given 
by the Stirling numbers s(i,j) and S(i,j) respectively. Then B = Aq. 
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Proor. A and B are the transition matrices between two different bases for the 
space of polynomials of degree n with constant term zero: 

e First basis: #,#7,..., t"; 

e Second basis: (t), (t)2,...,(t)n- 


We conclude with a formula for the Stirling numbers of the second kind. 


(5.3.5) Proposition. S(n, k) = =F D y(t jin 
j= 
Proor. We saw in the last section that this expression, without the factor th is 
the number of surjections from {1,...,n} to {1,...,&}. (I have also replaced the 
dummy variable i by j = k — i, and dropped the term with 7 = 0.) So it suffices to 
prove that the number of surjections is k!S(n, k). 
Each surjection f defines a partition of {1,...,} with k non-empty parts, viz., 

-1(1),...5 f-l(k). But every partition arises from exactly k! surjections, since we 

may assign the numbers 1,...,k to the parts in any order. The result is proved. 


5.4. Project: Stirling numbers and exponentials 


In this section, we explore a different way of looking at the inverse relationship 
between the two kinds of Stirling numbers: they correspond to substitution of 
exponential or logarithmic functions into a power series. 

We begin with the Stirling numbers of the second kind. First, we obtain an exponential 
generating function for S(n,k) for fixed k, as n variea. 


Sto ky" pr - (exp(t) — ye 


(5.4.1) Proposition. S= rA] 


n>o 
The proof uses the formula for om k) derived using PIE. We have 


gin TAO) cyri 


n>0 


Note that this gives the e.g.f. of the Bell numbers as a corollary, since 
5 a tt 


n>0k=0 


= exp(exp(t) — 1), 
on reversing the order of summation. 
This leads to the following result: 
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(5.4.2) Theorem. Let (fn) and (gn) be sequences with eg.fs F(t) and G(t) respectively. Then the 
following assertions are equivalent: 
n 


(a) go = fo aud gn = >, S(n, k) fx forn > 1; 
k=l 

{b} GU) = F(exp(t) — 1). 

Proor. If (a) holds, then 


= 10) + DSHS 
ApLk=1 g 
= y fleet -D Aet) -0 -1% 
k20 
= F(exp(t) — 1). 


Using the inverse relation between the Stirling numbers, we immediately deduce the following: 


(5.4.3) Theorem. Let (fa) and (gn) be sequences with eg.fs F(t) and G(t) respectively. Then the 
following assertions are equivalent: 
n 


(a) fo = go and fy = $ s(n, k)ge forn > 1; 
k=t 
(b) F(t) = G(log(1 +2). 
We can use this result to derive the egf. of the Stirling numbers of the first kind. Let g} = 1 


and gn = 0 for n # k. Then, if f and g are related as in the theorem, we have fn = s(n, k). Thus, 
we obtain 


(n, aan _ Qog +0) 


(5.4.4) Proposition. Dt ki 


n29 


5.5. Even and odd permutations 


Let 7 be a permutation of {1,...,n}, and denote by c() the number of disjoint 
cycles of r. The sign of x is defined to be sign(a) = (—1)*-*C); and v is said to be 
even or odd according as its sign is +1 or —1. We eA first: 


(5.5.1) Proposition. For n > 2, there are equally many even and odd permutations 
of an n-set. 


Proor. We use the formula 


t(t—1)...(t-n4+1)= Y a(n, bt. 


k=1 


Putting t = 1 and using the fact that n > 2, we see that Dg= S(n,k) = 0. But 

s(n, k) is defined to be (—1)"-* times the number of permutations with k cycles; 
so J s(n, k) is the sum of the signs of the permutations in Sa, and so there are 
equally many with either sign. 


To analyse the sign further, we relate it to the composition of permutations. 
Recall the convention that composition works from left to right. 
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(5.5.2) Proposition. Let + be a permutation of {1,...,n}, and r a transposition. 
Then 
e(ar) = e(x) £1. 


PRoor. We examine the effect of composition with a transposition (2 j). If i and j 
lie in different cycles of 7, then these cycles are ‘stitched together’ in 7, which has 
one fewer cycle than x. (For suppose that the cycles are (a ... a) and (b; ... bi), 
where i = a1, j = b. Check that 7 has the cycle (a; ... ak bi ... b)).) Conversely, 
if i and j lie in the same cycle of 7, then this cycle splits into two in zr. 


We see that xr has the opposite sign to x. Hence, if a permutation 7 is a 
product of m transpositions, then its sign is (—1)™; and, in particular, however 7 is 
expressed as a product of transpositions, the parity of the number of transpositions 
is always the same. 


(5.5.3) Theorem. (a) Any permutation is a product of transpositions. 
(b) the map sign is a homomorphism from the symmetric group to the multiplicative 
group {+1} of order 2. 


Proor. (a) It is intuitively clear that, however the numbers 1, ..., n are ordered, it is 
possible to sort them into the usual order by a sequence of swaps. Formally, if two 
points i and j lie in the same cycle of 7, then composing 7 with the transposition 
(i j) increases by 1 the number of cycles; so the result follows by induction on 
n — efr). 

(b) We have to show that sign(mı72) = sign(m)sign(™2). To show this, express 
za as a product of (say m) transpositions; composing ™ with each transposition 
changes its sign, so the overall effect is to multiply by (-1). 


It follows that the set of all even permutations in 5, is a normal subgroup. 
This subgroup is called the alternating group A,. We now have two proofs that 
|An| = n!/2 if n > 2. First, this is immediate from (5.5.1); second, A, is the kernel 
of a homomorphism onto a group of order 2. 


5.6. Exercises 


1. An opinion poll reports that the percentage of voters who would be satisfied 
with each of three candidates A, B, C for President is 65%, 57%, 58% respectively. 
Further, 28% would accept A or B, 30% A or C, 27% B or C, and 12% would be 
content with any of the three. What do you conclude? 


2. Make tables of the two kinds of Stirling numbers for small values of n and k. 
3. Prove directly that S(n,1) = 1, S(n,2) = 2""' — 1, and S(n,n — 1) = G). Find a 
formula for $(n,n— 2). 


4. Prove that |s(n,1)| = (n — 1)! using the recurrence relation, and show directly 
that the number of cyclic permutations of an n-set is (n — 1)!. 


5. This exercise outlines a proof that t = g=; S(n, k)(t)s. 
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(a) Let t be a positive integer, T = {1,...,¢}, and N = {1,... n}. The number 
of functions f : N — T is t”. Given such a function f, define an equivalence relation 
= on N by the rule 

i= 3 if and only if f(z) = f(z). 
The classes of this equivalence relation can be numbered C},...,Cx (say), ordered 
by the smallest points in the classes. (So C, contains 1; C2 contains the smallest 
number not in Ci; and so on.) Then the values f(C1),-.-, F(C,) are k distinct 
elements of T, and so can be chosen in (¢), ways; the partition can be chosen in 
S(n,k) ways. Summing over k proves the identity for the particular value of t. 

(b) Prove that if a polynomial equation F(t) = G(¢) is valid for all positive 
integer values of the argument t, then it is the polynomials F and G are equal. 


6. For this exercise, recall the Bernoulli numbers from Exercise 19 of Chapter 4, 
especially the fact that their e.g.f. is t/(exp(¢) — 1). Derive the formula 


n (—1)k!S(n, k) 
b = Y O mn 
L (k +1) 
for the n'è Bernoulli number. 


7. Let (fn) and (gn) be sequences, with eg.fs F(t) and G(¢) respectively. Show the 
equivalence of the following assertions: 
(a) gn = Eio (p) fas 
(b) G(x) = F(t) expl). 
8. Show that a permutation which is a cycle of length m can be written as a product 
of m — 1 transpositions. Deduce that it is an even permutation if and only if its 
length is odd. Hence show that an arbitrary permutation is even if and only if it 
has an even number of cycles of even length (with no restriction on cycles of odd 
length). 
9. This exercise outlines the way in which the sign of permutations is normally treated 
by algebraists. Let 21,...,2, be indeterminates, and consider the polynomial 

F(x,- oe sn) = Į[(z; — zi). 

i<j 

Note that every pair of indeterminates occur together once in a bracket. If z isa 
permutation, then F(217,--.,2nx) is also the product of all possible differences (but 
some have had their signs changed). So 


F(z, wee Enr) = sign(m)F(21, tee ¥n); 


where sign(7) = +1 is the number of pairs {¢, j} whose order is reversed by x. Prove 
that 

e sign is a homomorphism; 

e if r is a transposition, then sign(7) = —1. 
10. Recall from Section 3.8 that a preorder is a reflexive and transitive relation which 
satisfies trichotomy. Prove that the exponential generating function for the number 
of preorders on an n-set is 1/(2 — exp(t)). [HINT: the e.g-f. for the number of orders 


is 1/(1 —#).] 
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11. (a) Show that the smallest number of transpositions of {1,..., n} whose product 
is an n-cycle is n — 1. 

(b) Prove that any n-cycle can be expressed in nn? different ways as a product 
of n—1 transpositions. [Hint: The product of the transpositions (z; y:) is an n-cycle 
if and only if the pairs {x;,y;} are the edges of a tree (Section 3.10), Double-count 
(tree, cycle) pairs, using Cayley’s Theorem (3.10.1) and the fact that all cycles have 
the same number of expressions as products of transpositions.] 


6. Latin squares and SDRs 


. hets, grids, and other types of calculus ... 
Alan Watts, The Book (1972). 


Television? The word is half Latin and half Greek. No good can come of it. 


C. P. Scott (attr.) 


Topics: Latin squares, SDRs, Hall’s Theorem, orthogonal Latin 
squares, quasigroups, groups, permanents 


TECHNIQUES: 
ALGORITHMS: 


CROSS-REFERENCES: Network flows (Chapter 11), affine planes and 
nets (Chapter 9), groups (Chapter 14) 


In this chapter, we examine Latin squares, showing that there are many of them 
(by means of a digression through Hall’s theorem on SDRs), and then consider 
orthogonal Latin squares. 


6.1. Latin squares 


Latin squares arise in Euler’s ‘thirty-six officers’ problem, but with one level of detail 
removed. The definition is as follows. 


A Latin square of order n is an n x n array or matrix with entries taken from the 
set {1,2,...,n}, with the property that each entry occurs exactly once in each row 
or column.! So, in a solution to Euler’s problem, if the officers’ ranks are numbered 
from 1 to 6, they are arranged in a Latin square; and similarly for the regiments. 


REMARK. Sometimes it is convenient to regard the entries as coming not from the 
set {1,...,7} but from an arbitrary given set of n elements. 


! Why are they called Latin squares? Wait and see! 
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The existence of Latin squares is not in doubt, The array 


is a Latin square of order 5; the construction obviously generalises. So our goal is 
to refine this observation, and come up with some estimate of how many different 
Latin squares there are. 

We can interpret a Latin square as follows.2 Given a class of n boys and n girls, 
arrange a sequence of n dances so that each boy and girl dance together exactly 
once. (The (i, j) entry of the Latin square gives the number of the dance at which 
the i‘ boy and the j"® girl dance together.) 

Latin squares were first used in statistical design. Very roughly, suppose that n 
varieties of a crop have to be tested. A field is laid out in a n x n array of plots. 
We assume that there may be some unknown but systematic variation in fertility, or 
susceptibility to insect attack, moving across or down the field; so we arrange that 
each variety is planted in one plot in each row or column, to offset this effect. 


6.2. Systems of distinct representatives 


We have to make quite a long detour to reach our goal. We prove a result known as 
Hall’s Marriage Theorem; this was originally shown by Philip Hall, and a refinement 
(which we need) was shown by Marshall Hall Jr.* but there are now many different 
proofs. This result is closely connected with the theory of flows in networks, and 
you may meet it in an Operations Research course. Our objective here is different. 
(We return to networks in Chapter 14.) 

Let Ay,...,An be sets. A system of distinct representatives (SDR) for these sets 
is an n-tuple (z1,.. £n) of elements with the properties 
(a) z; € Aj for è =1,...,7 (ie representatives); 

{b) z; £ zj for i #j (ie, distinct). 
For any set J C {1,...,n} of indices, we define 
ACJ ) = U Aj. 
jEJ 
(Don’t confuse this with the similar A; which occurred in PIE, where we had 
intersection in place of union. Here, A(@) = 9.) 

If the sets A1,- .-, An have a system of distinct representatives, then necessarily 
|A(J)| > |F| for any set J € {1,...,n}, since A(J) contains the representative zj of 
each set A; for j € J, and these representatives are all distinct. Hall’s Theorem says 
that this necessary condition is also sufficient: 

_ 


2 This is in the spirit of Kirkman’s Schoolgirls Problem (Chapters 1, 8). 


3 No relation. 
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(6.2.1) Halls Theorem. The family (Ai,..., An) of finite sets has a system of distinct 
representatives if and only if the following condition holds: 


(6.2.2) Hall’s Condition 


|A(J)| > |J] for all J C {1,...,n} 


Proor. We use induction on the number n of sets. The induction obviously starts: 
(HC) guarantees a representative for a single set! We call a set J of indices critical 
if |A(J)| = |J|. The motivation is that, if J is critical, then every element of A(J) 
must be used as a representative of one of these sets. We divide the proof into two 
cases: 


CASE 1. No set J is critical except for J = P and i 

; = possibly J = {1,...,n}. Let 
En be any point of A, (note An # 0 by (HC)) and, for j = 1,...,n — 1 ie 
Al, = A; \ {tn}. We claim that the family (Aj,...,A\,_;) satisfies (HC). Take 
JC {1,...,n—1}, and suppose that J # 0. Then 


LE) = AW )I- 1 
> | = 1, 


the first inequality true since at worst x, is omitted, the second since J is not critical 

by assumption. So |A’(J)| > |J|, proving the claim. 

; By induction, (Aj, A,,-1) has a SDR (m,...,%,-1)- Then (21,...,2,) is a 
DR for the original family, since clearly z,, is distinct from all the other x;. 


CASE 2. Some set J £b, {1,.. . n} is critical, We may suppose that J is minimal 
subject to this. Then the family (A; : 7 € J) has a SDR (z; : 7 € J), by induction. 
For i € J, set A? = A; \ A(J). We claim that the family (A? : i ¢ J) satisfies (HC) 
Take K to be a set of indices disjoint from J. Then , 


|A*(K)| = |A(ZU K)| — |A(J)I 
2 |JUK|-|J| 
=|K\, 


the first equality since in fact A*(K) = A(J UK)\ A(J), a i ity si 
, and the inequality since 
AZU O| 2 ew Eo at |A(J)| = |J| since J is critical. 
ere is a zi : i gœ J) for the sets (AF : i g J). Combining this with th 
SDR for the sets (A; : j € J) gives the required result. i ™ ° 


This theorem is sometimes called Hail’s Marriage Theorem, because of the 
following interpretation. Given a set of boys and a set of girls, each girl knowing a 
specified set of boys, it is possible for all the girls to marry boys that the know if 
and only if any set of & girls know altogether at least k boys. 7 
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(6.2.3) Hall’s Theorem Variant. Suppose that (A1,...,An) are sets satisfying (HC), 
and suppose that |A;| > r fori =1,...,n. Then the number of different SDRs for 


the family is at least 


r! ifr <n, 
[e-e -n $1) ifr >n. 


Note. Two SDRs may use the same elements and still be different, if they assign 
different elements to the sets. For example, (1, 2) and (2, 1) are different SDRs for 
the sets ({1,2,3}, {1,2,4}). 
Proor. This is just a variant on the proof of Hall’s Theorem. We use induction on 
n; if n = 1, then a single set of size at least r has at least r SDRs! So assume true 
for families with fewer than n sets, 
In Case 2 of the proof above, we have r < |J| < n, and the family (A; : j € J) 
has at least r! SDRs, each of which can be extended to the whole family. 
In Case 1, there are at least r choices for the representative £n. For each choice, 
the family (Ai: 1 <i < n — 1) consists of n — 1 sets each of size at least r — 1 
satisfying (HC), so by induction it has a least (r —1)! SDRs if r < n, or at least 
(r—1)...((r -1I) —(n-I +l) ifr >n. Multiplying gives the result. 


We need the following consequence of Hall’s Theorem: 


(6.2.4) Theorem. Let (A;,...,An) be a family of subsets of {1,...,n}, and let r be 
a positive integer such that 

(a) A| =r fort =1,...,7; 

(b) each element of {1,... n} is contained in exactly r of the sets Ai. An 
Then the family (A1,. .- , An) satisfies (HC), and so has an SDR. 


PRoor. Let J be a set of indices. We count choices of (j, x), where j € J and z € A;. 
There are |J} choices for j, and for each 7 there are r choices for z € Aj, or r|J| 
altogether. On the other hand, z € A(J), so there are |A(J)| choices for z; and x 
lies in r sets, not all of which might have index in J, so there are at most r choices 
for j. Thus 

r|J| < [AC Ir, 


and since r > 0 we get (HC). 


(6.2.5) Corollary. Under the hypotheses of the last theorem, the family of sets has 
at least r! SDRs. 


This just combines Theorem (6.2.4) with the Hall Variant (6.2.3). 


6.3. How many Latin squares? 


Now we return to Latin squares. We want to construct Latin squares ‘row by row’, 
and so we want to be sure that if we have fewer than n rows, there are many 
ways to add another row. So we define a k x n Latin rectangle, for k < n, to bea 
k Xn array with entries from {1,...,7}, having the property that each entry occurs 
exactly once in each row and at most once in each column. 
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(6.3.1) Proposition. Given a k x n Latin rectangle with k < n, there are at least 
(n — k)! ways to add a row to form a (k +1) x n rectangle. 


PROOF. The elements of the new row must all be distinct, and each must not be 
among those already used in its column. So we let A; be the set of entries not 
occurring in the i** column of the rectangle, and we have: 


(21,.--,2n) is a possible (k + 1)“ row for the rectangle if and only 
if it is a SDR for the family (A,,..., An). 


Now clearly each set A; has size n — k, since k of the n entries have already 
been used. Consider a particular entry, say x. This occurs k times in the rectangle 
(one in each of the rows), in k distinct columns; so there are n — k columns where it 
does not occur. So the hypotheses of Corollary (6.2.5) are satisfied, with r = n — k. 


(6.3.2) Theorem. The number of Latin squares of order n is at least 


Proor. Add rows one at a time: there are at least n! choices for the first row, at 
least (n — 1)! for the second, and so on. 


This problem incorporates two counting problems we met earlier. The first row 
of a Latin square of order n is simply a permutation of {1,...,7}, and there are 
exactly n! choices for it. Given the first row, we may (by re-labelling) assume that 
it is (12 ... n); then a legitimate second row is precisely a permutation satisfying 
in Æ i for i = 1,...,n, that is, a derangement. We know that the number of 
derangements is the nearest integer to n!/e for n > 4, this is better than the lower 
bound of (n — 1)! which we used, so the estimate for the number of Latin squares 
can be improved a bit. However, the number of choices of the third row depends on 
the way the first two rows were chosen, so we cannot get the exact answer simply 
by multiplying n numbers together. 


EXAMPLE. There are 2 Latin squares of order 2, and 3!-2! = 12 of order 3. However, 
for order 4, there are 24-3 choices of the first two rows which can be extended in 
4 different ways, and 24-6 which have just 2 extensions; so the number of Latin 
squares is 

24:3-4424-6-2 = 576. 
(See Exercise 1.) 
Remark. Let L(n) be the number of Latin squares of order n. We have shown that 
L(n) 2 n!(n - 1)!... 1! This bound was improved, about fifteen years ago, to 

L(n) > (nlp fn”. 

We explore this in Section 6.5. On the other hand, we have 


Ln) sn”, 
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since there are n” ways of filling in the n? positions of the array with entries from 
{1,...,n}. We can improve this to 


L(n) < (nt) 


by observing that each entire row is chosen from the set of permutations of {1,...,7}, 
and there are n! permutations. A further improvement is made by noticing that all 
the rows after the first are derangements of the first row, so roughly L(n) < n!*/e"™"}. 

To compare these bounds, it is helpful to estimate log L(n) rather than L(n) 
itself. The simplest possible upper bound, namely L(n) < n”, gives 


log L(n) < n? logn. 


On the other hand, we have 


> ln? logn + O(n’), 


where we used the simple bound k! > (k/e)* from Chapter 2, Exercise 3. So roughly 
the upper and lower bounds for log L(n) differ by a factor of 2. The improved lower 
bound mentioned above removes this factor, giving 


log L(n) = n° logn + O(n”). 


6.4. Quasigroups 


There is another way of looking at Latin squares. Let G = {g1,...,9n}. If A = (ai) 
is any n x n matrix with entries from the set {1,...,n}, we can define a binary 
operation, or ‘multiplication’, on G by the rule 


giog; = 9, ifandonlyif aj =k. 


Conversely, any binary operation on G gives rise to such a matrix, once we have 
numbered the elements of G as g1,..-,9n-' 


A binary structure like G above is called a quasigroup if the following axioms 
hold: 

(left division) for all g;, gù € G, there is a unique g; € G with gig; = ge; 

(right division) for all g:,9, € G, there is a unique g; € G with 9:9; = ge. 
Now the following result follows from the definitions: 


(6.4.1) Proposition. A binary structure G is a quasigroup if and only if the corre- 
sponding matrix A is a Latin square. 


4 The matrix is the multiplication table of the binary structure. 
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Proor. The left divisibility condition just says that each column of the matrix 
contains each entry exactly once; and similarly for right division. 


There are various advantages to turning Latin squares into algebraic objects 
like quasigrowps. For one thing, we can obtain a kind of measure of the strength of 
various algebraic axioms by seeing how many Latin squares correspond to structures 
satisfying these axioms. For example, there are very many quasigroups; but there 
are many fewer groups (see next section), so the group axioms are very powerful! 
Another is that algebraic constructions can be transferred to Latin squares. One 
example of this is the direct product. 


Let G and H be binary structures (the binary operation in each of them will be 
denoted by 0). The direct product G x H is defined, just as for groups, as follows: it 
is the set of ordered pairs (g, h), for g € G, k € H, with operation 


(915 21) © (g2, k2) = (91 © ga, hi © ha). 


Now it is easily established that the direct product of quasigroups is a quasigroup. 
(For left divisibility, suppose that in the above equation gz, hz, 93, ha are given. Then 
gı is determined by left divisibility in G, and similarly h; in H.) 


The direct product can be translated into a direct product operation on Latin 
squares, which we write with the same notation, i.e. the direct product of A and B 
is A x B. This is considerably more complicated to define directly, although the idea 
is simple. For example, we have: 


6.5. Project: Quasigroups and groups 


The best-known examples of quasigroups are groups: these are quasigroups with 
an identity element whose composition is associative. In this section, we describe 
a refinement of the estimate for the number of quasigroups, using the proof of the 
van der Waerden permanent conjecture; and we show that two of the most basic 
theorems about groups (Lagrange’s Theorem and Cayley’s Theorem)* can be used 
to put an upper bound on the number of groups. We see that groups are very rare 
among quasigroups} in other words, the associative law is a very powerful condition. 


QUASIGROUPS: PERMANENTS AND SDRs. 

Our lower bound for the number of quasigroups comes from the van der Waerden permanent 
conjecture, whose truth was shown by Egorychev and Falikman (independently). First we need a 
couple of definitions, whose relevance will not be immediately apparent! 


5 These theorems and their historical context are described in Chapter 14. 
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A matrix is said to be stochastic if its entries are non-negative real numbers and its row sums are 
equal to 1. The term suggests a connection with probability. A system is initially in one of m states 
Si,- --, Sm, and can make a transition to one of n states T;,...,T,. If the probability of jumping 
from 5; to T} is pij, then the m x n matrix with (i, j) entry pi; is stochastic. A stochastic matrix is 
called doubly stochastic if, in addition, its column sums are all equal to 1. (This implies in particular 
that the matrix is square.) This condition doesn’t have an obvious probabilistic interpretation. 

Let A be an n x n matrix with (i, 7) entry a;;. Then there is a well-known formula for the 
determinant of A: a 

det(A) = > II sign(™) aj sx. 
aéS,i21 
(Recall our convention that permutations act on the right, and the definition of the sign of a 


permutation in Chapter 5.) 
If we leave out the sign factor in this expression, we obtain the permanent of A: 


n 


per(A) = > Giir. 
1 


wes, i= 


Though the formula is simpler, the permanent is much harder to manipulate or evaluate than the 
determinant! It is clear that the matrix with every entry 1/n has permanent n!/n” (the sum has n! 
termas, each the product of n factors 1/n.) 

The van der Waerden permanent conjecture asserted: 


The permanent of an n x n doubly stochastic matrix A is at least n!/n", with 
equality if and only if every entry of A is equal to 1/n. 


This conjecture was proved in 1979-1980, independently, by Egorychev and Falikman. Earlier, Bang 
and Friedland had shown the slightly weaker result that the permanent of a doubly stochastic matrix 
is at least e~". (Note that e~" < n!/n", by Exercise 3 of Chapter 2.) If you want to see how it was 
done, Marshall Hall’s Combinatorial Theory (1989) contains an exposition. 


What is the relevance to this chapter? Given a family (A;,..., An) of n subsets of {1,...,7}, 
we define the incidence matriz A of the family by the rule that the (i,j) entry of A is given by 


Aga (i ties, 
=O ifig Aj. 


Then we have: 


(6.5.1) Proposition. With the above notation, per(A) is equal to the number of SDRs of the family 
of sets. 


Proor. In the evaluation of the permanent, the product corresponding to a permutation 7 is zero 
unless iz € A; for all 4, when it is one. In this case, (la,... na) is a SDR for (Aj,..., An) 
Conversely, any SDR arises from such a permutation. Hence the permanent is equal to the number 
of SDRs. 


(6.5.2) Proposition. Let (A1, ..., An) be a family of subsets of {1,...,n}. Suppose that 
o each set A; has cardinality r; 
ə each point i lies in r of the sets A,,..., An. 

Then the number of SDRs of the family is at least nl(r/n)”. 


Remark. You should stop and compare this with the lower bound r! proved in Section 6.3. 


Proor. The incidence matrix A has all row and column sums r. So (1/r)A is doubly stochastic, 
whence per((1/r)A) > n!/n”, from which the result follows since per((1/r)A) = (1/r)*per(A). 


6.6. Orthogonal Latin squares 


(6.5.3) Proposition. The number L(n) ofn x n Latin squares satisfies 
12n 
L(n) > Prai 


PROOF. Just as in Section 8.3, we have 


E(n) > II ni(r{n)”, 
r=l 


What about the number of quasigroups? Given a quasigroup, if we number its elements 1,..., n 
in any order, its multiplication table is a Latin square. So each quasigroup gives at most n! Latin 
squares; this is insignificant compared with L(n), and the estimate n? logn + O(n”) holds for the 
logarithm of the number of quasigroups too. 


GROUPS: LAGRANGE AND CAYLEY. 


We will now show that the number of groups is very small compared with the number of quasigroups. 
If G(n) is the number of groups of order n, we prove that G(n) < n>". In other words, 
log G(n) = O(n(logn)?), much smaller than log L(n). 

The proof, not surprisingly, requires some algebra. In fact, little is needed; just two of the 
basic theorems proved in the nineteenth century. (Using more powerful tools, better estimates can be 
derived.) The results we need are: 

o Lagrange’s Theorem: The order of a subgroup of a group G divides the order of G. 
+ Cayley’s Theorem: Any group of order n is isomorphic to a subgroup of the symmetric group 
n- 
We also need the concept of the subgroup H generated by a set {g1,..., 9x} of elements of G. This 
is the smallest subgroup of G containing 91,...,9%, and consists of all elements of G which can be 
written as products of these elements and their inverses. (See Chapter 14 for further discussion.) 


(4.5.4) Lemma. A group G of order n can be generated by at most log, n elements. 


Proor. We prove by induction that if g1, g2,... are chosen so that, for all k, gk}; does not lie in 
the subgroup Gy generated by g1,...,g% (and gı Æ 1), then the order of Gk is at least 2*. For 
the inductive step, |Gi41| > [Gr] (since gx € Gr41 \ Ge), and |G| divides |G.41| by Lagrange’s 
Theorem; so we have |Gz41| > 2|G:|, and the induction goes through. 

By Cayley’s Theorem, the number G(n) of groups of order n (up to isomorphism) is no greater 
than the number of subgroups of order n of S,. By the Lemma, this number does not exceed the 
number of choices of log, n elements of Sn; so 


Gln) < (ANP < nosan, 


6.6. Orthogonal Latin squares 


Two Latin squares A = (a;;) and B = (bj) are said to be orthogonal if, for any 
pair (k,l) of elements from {1,...,n}, there are unique values of ¿ and j such that 
aij = k, bj = l; im other words, there is a unique position where A has entry k 
and B has entry l. A set {A,,...,A,} of Latin squares is called a set of mutually 
orthogonal Latin squares, or set of MOLS for short, if any two squares in the set are 
orthogonal. (Sometimes the terms pairwise orthogonal and POLS are used instead.) 


Sometimes a pair of orthogonal Latin squares is called a Graeco-Latin square. 
The reason comes from a different representation sometimes used. Instead of 
numbers, the entries can be taken from any set of size n; the first n letters of the 
alphabet are commonly used. Now if we use letters of different alphabets, say the 
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Latin alphabet for A and the Greek for B, are used, then the two squares can be 
combined into one unambiguously; and A and B are orthogonal if and only if each 
combination of a Latin and a Greek letter occurs exactly once in the square. For 
example, here are two orthogonal Latin squares of order 3 and the corresponding 
Graeco-Latin square. 


A question which has had much attention is: 
What is the maximum size of a set of MOLS of order n? 


This question is closely connected with the existence question for projective and 
affine planes, as we will see in Chapter 9. 


Let f(n} denote the maximum number of MOLS of order n. We observe first 
that f(n) < n—1 for all n. For let Aj,...,A, be mutually orthogonal Latin squares; 
without loss of generality, we may assume that each square has (1,1) entry 1. Now 
each square has n — 1 further entries 1, none occurring in the first row or column; 
and, by orthogonality, these 1s cannot occur in the same position in two different 
squares. Since there are only (n — 1)? available positions, there cannot be more than 


n — 1 squares. 
(6.6.1) Proposition. If n is a prime power, then f (n)=n—-I. 


This result uses Galois’ Theorem on the existence of finite fields (see Section 
4.7). We use the fact that there is a field F of any given prime power order n. 
Now we take the elements of F to index the rows and columns of all the squares. 
For each non-zero element m € F, we define a matrix Am whose (i,j) entry is 
(Am)ij =im +3. 

Now each A,, is a Latin square. For, if im + jı = im + ja, then jı = j2; and, if 
im +j = im + j, then im = im, and so 7, = iz (since m is non-zero and so has 
an inverse). 

Moreover, these squares are orthogonal. For, given elements a,b € F, and 
mı # Mo, the equations 

im +7 =4, 
im, +j = 6, 


have a unique solution (i, 7). 


This doesn’t appear to help evaluate f(n) in general. But it gives us a lower 
bound. To show this, we use the direct product construction for Latin squares, and 
make the following observation: 


6 ‘The Latin letters alone form a Latin square. (Hence the name.) 
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(6.6.2) Proposition. If A, and A, are orthogonal Latin squares of order n, and Bı 
and B; orthogonal Latin squares of order m, then the Latin squares A, x By and 
A, x B, of order nm are orthogonal. 


Proor, As we saw in the last section, direct products are easier to define for 
quasigroups. So we re-formulate orthogonality for quasigroups. For convenience, we 
take the same set G = {g1,...,9n} of symbols for both quasigroups, but distinguish 
the binary operations. Let (G,o) and (G,*) be quasigroups. These quasigroups are 
said to be orthogonal if the following holds: 
(orthogonality) for all gx, gı € G, there exist unique elements g;, g; € G such that 
9: © 9; = ge and gi * gj = gi. 
This is equivalent to orthogonality of the corresponding squares. Now it is a simple 
exercise to prove that, if (G,o) and (G,*) are orthogonal quasigroups, and (H,°) 
and (H,») are another pair of orthogonal quasigroups (possibly of different order), 
then (G x H,o) and (G x H,*) are also orthogonal. 


(6.6.3) Proposition. Let n = p{'...p%", where p1,...,pr are distinct primes and 
&,-..,0, > 0, and let q be the minimum of p{',...,p%. Then f(n) > ¢— 1. 


Proor. Let q: = p~. Then we can find qı — 1 MOLS of order m, q2 — 1 of order 
qa, and so on. Since a subset of a set of MOLS is again a set of MOLS, if q is the 
minimum of q1, q2, - . , we can find q — 1 MOLS of each of these orders; taking their 
products gives a set of q — 1 MOLS of order n. 


REMARK. More generally, we have 


F(ninz) = min{ f(n) f(r2)}- 
(6.6.4) Corollary. Ifn #2 (mod 4), then there exist two orthogonal Latin squares 
of order n. 


Proor. If q = 2 in the Proposition, then n is divisible by 2 but not by 4, so that 
n=2 (mod 4). 

Euler conjectured that the converse is also true; in other words, that if n = 2 
(mod 4), then orthogonal Latin squares do not exist. For n = 6, this is his ‘thirty-six 
officers’ problem posed in the first chapter. It turned out that Euler was right about 
the 36 officers (no solution exists), but wrong for all larger values of n. More 
generally, it is known that f(n) — œ as n — oo. (This means that, for any given r, 
there exist r MOLS of order n for all but finitely many values of r. For example, 
two MOLS of order n exist for all n except n = 1,2 and 6.) 


6.7. Exercises 


1. (a) Show that the number of n x n Latin squares is 1, 2, 12, 576 for n = 1,2,3,4 
respectively, 

(b) Prove that, up to permutations of the rows, columns, and symbols in a Latin 
square, there are unique squares of orders 1, 2, 3, and two different squares of 
order 4. 

(c) Show that one of the two types of Latin square of order 4 has an orthogonal 
‘mate’ and the other does not. 
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2, Show that, for n < 4, any Latin square of order n can be obtained from the 
multiplication table of a group by permuting rows, columns, and symbols; but this 
is not true for n = 5. 

3. A Latin square A = (a;j) of order n is said to be row-complete if every ordered 
pair (x,y) of distinct symbols occurs exactly once in consecutive positions in the 
same row (i.e, as (aij, @i j+1) for some i,j). (Note that there are n(n — 1) ordered 
pairs of distinct symbols, and each of the n rows contains n — 1 consecutive pairs 


of symbols.) 
(a) Prove that there is no row-complete Latin square of order 3 or 5, and 


construct one of order 4. 

(b) Define analogously a column-complete Latin square. 

(c) Suppose that the elements of Z/(n) are written in a sequence (21,22,.++:2n) 
with the property that every non-zero element of Z/(n) can be written uniquely in 
the form 241 — 2; for some è = 1,...,n — 1. Let A be the Latin square (with rows, 
columns and entries indexed by 0,...,7 — 1 instead of 1,... ,n) whose (7, j) entry is 
aij = 2; + 2;. (This is the addition table of Z /{n), written in a strange order.) Prove 
that A is both row-complete and column-complete. 

(d) If n is even, show that the sequence 

(0,1,n —1,2,n—2,...,4n—1, jn + 1, jn) 
has the property described in (c).” 
REMARK. Row- and column-complete Latin squares are useful for experimental 
design where adjacent plots may interact. 
4. (a) Find a family of three subsets of a 3-set having exactly three SDRs. 
(b) How many SDRs does the family 


{{1,2,3}, (1,4,5}, (1,6, 7}; {24 6} (2,5, 7}, {3,4,7} {3, 5,6}} 
have?® 


5. Let (Ai,...,An) be a family of subsets of {1,...,n}. Suppose that the incidence 
matrix of the family is invertible. Prove that the family possesses a SDR. 


6. Use the truth of the van der Waerden permanent conjecture to prove that the 
number d(n) of derangements of {1,...,n} satisfies 


1 n 
d(n) > n! ( - -) 
(n}2n z 
How does this estimate compare with the truth? 


7. Prove the following generalisation of Hall’s Theorem: 


If a family (A1, .., An) of subsets of X satisfies |A(J)| > [Z| - r 
for all J C {1,...,7}, then there is a subfamily of size n — 7 which 
has a SDR. 


[Hint: add r ‘dummy’ elements which belong to all the sets. 


7 I am grateful to Rosemary Bailey for this exercise. 
2 This family is the set of triples of the Steiner triple system of order 7; see Chapter 8. 


7. Extremal set theory 


Commonest family name. The Chinese name Zhang is borne, according to 
estimates, by between 9.7 and 12.1 per cent of the Chinese population, so 
indicating even on the lower estimate that there are at least some 104 million 
Zhangs. 


Peter Matthews (ed.), The Guinness Book of Records (1993). 


Topics: Intersecting families; Sperner families; de Bruijn—Erdés 
Theorem; [regular families] 


TECHNIQUES: LYM method 
ALGORITHMS: 


CROSS-REFERENCES: Hall’s Theorem (Chapter 6); Steiner triple sys- 
tems (Chapter 8), projective planes (Chapter 9) 


Extremal set theory considers families of subsets of a set satisfying some restriction 
(perhaps in terms of inclusion or intersection of its members). It then asks the 
questions: 

e What is the maximal size of such a family? 

e Can one describe all families which meet this bound? 
Like many topics, it is best introduced by example. In this chapter, we'll consider 
three example results in extremal set theory. In the first, the proof of the bound is 
trivial, but there are far too many families meeting it to allow any decent description. 
The second is just the opposite: the proof of the bound is quite ingenious, but not 
much more work is needed to give a precise description of families meeting it. The 
last case is somewhere between; it is included because it ties in with another of our 
topics, finite geometry. 


Let X = {1,2,...,n}. The set of all subsets of X is called the power set of X, 
and denoted P(X), or sometimes 2”. (The latter notation relates to the fact that 
|\P(X)| = 21X1, with a natural bijection between these sets, as we saw in Chapters 2 
and 3.) By a family of sets is meant a subset F of P(X). The conditions we will 
impose on a family all relate to pairs of sets in the family; they are as follows: 

(a) any two sets have non-empty intersection; 
(b) no set contains another; 
(c) two sets have exactly one common point. 
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7.1. Intersecting families 
A family F of subsets of X is intersecting if A,B € F = ANB #0. 


(7.1.1) Proposition. An intersecting family of subsets of {1,...,n} satisfies |F| < 
27-1, Moreover, there are intersecting families of size 2°71. 


Proor. Let X = {1,...,n}. The 2% subsets of X can be divided into gn} 
complementary pairs {A, X \ A}; clearly an intersecting family contains at most one 
set from each pair. This proves the bound. But the family of all sets containing a 
particular element (say 1) of X has cardinality 2"-! and is intersecting. 


There are far too many intersecting families of size 2°-! for there to be any 
hope of classifying them. Here are a couple of examples in addition to the ones in 
the proof of the Proposition. 


EXAMPLE 1. If n is odd, the set of all subsets A containing more than half the points 
of X is intersecting, and has size 2"~" (since, as required by the proof, it does contain 
one of each pair of complementary sets). If n is even, we modify the construction as 
follows: take all sets with strictly more than n/2 points; then divide the sets of size 
n/2 into complementary pairs, and take one of each pair in any manner whatever. 
This gives lots of different examples. (Note that if |A| > n/2 and |B| > n/2, then 


|AN Bl = |A| + |B] -|AUB] > n/2+n/2—-n =), 


so the families constructed really are intersecting.) 


EXAMPLE 2. Let X = {1,...,7}, and let B consist of the seven subsets 
{{1, 2,3}, {1,4,5} {1,6,7} {2,4,6} {2,5,7} {3,4,7} {3, 5, 6}}. 


(Then (X,8) is a Steiner triple system of order 7 — see the next chapter for 
definitions.) Let F be the set of all those subsets of X which contain a member of 
B. Then F is intersecting, and |F| = 64 = 27- (see Exercise 1). 


If we further restrict the sets to all have the same size k, what can be said? If 
n < 2k, then any family of k-subsets of an n-set is intersecting, and there is no 
restriction; so we should assume that n > 2k to get meaningful results. If n = 2k, 
then an intersecting family contains at most one of each pair of disjoint sets, and 


so contains at most z) = (z) sets. In general, there is always an intersecting 


family of size zib consisting of all those k-sets containing some fixed point of X; 
and, for large enough n, this is best possible. More generally, there is a ot ive (tr) 
n—t 


family F of k-sets (ie, satisfying |F} N F,| 2 t for all Ki, Fz € F) of size (= 
(consisting of all k-sets containing a fixed t-set), and this is also best possible for 


large enough n:! 


1 Unusually for the twentieth century, this theorem was proved in 1947, but was not published until 
1963. 
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(7.1.2) Erdés-Ko-Rado Theorem. Given k and t, there exist nı, n such that 

(a) ifn > nj, a t-intersecting family of k-subsets of an n-set has size at most (an 
(b) if n > no, a t-intersecting family of k-subsets of an n-set which has size (8) 
consists of all k-subsets containing some t-subset of the n-set. 


A special case of this theorem is given as Exercises 2 and 3. 


7.2. Sperner families 


The family F of sets is called a Sperner family if no member of F properly contains 
any Other, that is, 
ABEeFaAGBand BGA. 


For any fixed k, the set of all subsets of X of size & forms a Sperner family 
containing (7) sets. Since the binomial coefficients increase to the midpoint and 
then decrease, the largest Sperner families of this type occur when k = n/2 (if n is 
even) and when k = (n — 1)/2 or (n + 1)/2 (if n is odd). It turns out that these are 
the largest Sperner families without restriction. 


(7.2.1) Sperner’s Theorem, Let F be a Sperner family of subsets of the n-element 
set X. Then |F| < (a Moreover, if equality holds, then F consists either of all 
subsets of X of size |n/2|, or all subsets of size [n/2] (these are the same if n is 
even}. 


Proor. The ingenious proof uses the concept of a chain of subsets, a sequence 
@=Ap CAVC...C A, =X. 


How many chains are there? If a is any permutation, then we get a chain by setting 
A; = {17,...,i7} for i = 0,...,n. Conversely, in a chain, the points are added one 
at a time, so we can uniquely recover the permutation. Thus there are as many 
chains as permutations, viz. nl. 

Next we ask: How many chains contain a fixed set A? If |A| = k, then it must 
occur that A = Ap, and the chain is obtained by welding together a chain for A and 
a chain for č \ A. So A lies in kl(n — k)! chains, a proportion 1/ (3) of the total. 
We could also see this by observing that each of the (2) sets of size k lies in equally 
many chains, by symmetry. 

Now let F be a Sperner family. By assumption, any chain contains at most one 
member of F. So the number of chains which do contain a member of F is 


X [All(n - [Al = nt | SO 


1 
AGF AEF 


Since there are only n! chains altogether, we see that 
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Now, as we already observed, the middle binomial coefficients are the largest; 
so their reciprocals are the smallest, and if we set m = |n/2], we have 


Eas 


AGF 


whence |F| < (o) the required bound. 

When is this bound met? Examining the argument, we see that attaining the 
bound forces that (a) = (") for every A € F, in other words, every set in F has 
size either m = |n/2|, or n — m = [n/2]. If n is even, then these two numbers are 
equal, and F must consist of all the sets of size m. But if m is odd, there is further 
work required. In that case n = 2m + 1 and all the sets have size m or m + 1, but 
we have to show that either they all have size m, or they all have size m +1. 


Looking at the proof again we see that, if the bound is met, then every chain 
contains one member of F; so, if A is an m-set and B a (m+ 1)-set with A C B, 
then A € F if and only if B ¢ F. Now suppose that A is a m-set in F, and A’ any 
other m-set. It is possible to find a sequence of sets beginning at A and ending at 
A’, every term being of size m or m + 1, and each two consecutive terms related by 
inclusion: 

AC BDAC... 


We see, following this sequence, that all of its m-sets belong to F, while none of 
its (m + 1)-sets do. So A’ € F. Since A’ was arbitrary, F consists of all m-sets. 
Similarly, if there is a (m + 1)-set in F, then it consists of all (m + 1)-sets. 

The technique used here is called the LYM technique. Roughly speaking, it 
depends on the fact that a Sperner family and a chain have at most one set in 
common, and the number of chains containing a set takes only a few values. A 
simpler example along the same lines is given in Exercise 2. 


7.3. The de Bruijn-Erdoés Theorem 


The third result is a specialisation of the first. Instead of assuming that two sets 
meet in at least one point, we assume that they meet in ewactly one. 

The proof of this theorem is a bit harder than what we've had before; if you have 
trouble following it, concentrate on understanding the result. The proof uses Hall’s 
‘Marriage Theorem’ (6.2.1) on the existence of systems of distinct representatives. 


(7.3.1) De Bruijn-Erdés Theorem. Let F be a family of subsets of the n-set X. 

Suppose that any two sets of F have exactly one point in common. Then |F| < n. 

If equality holds, then one of the following situations occurs: 

(a) up to re-numbering the points and sets, we have F = {Aj,...,An}, where 
A; = {i,n} fori =1,...,n (so A, = {n}); 

(b) up to re-numbering the points and sets, we have F = {Aj,...,An}, where 
A, = {1,2,... n — l1}, and Aj = {i,n} forl<i<n-1; 

(c) for some positive integer q, we have n = q? + q + 1, each set in F has size q + 1, 

and each point lies in q + 1 members of F. 
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REMARK. Case (b) is illustrated in Fig. 7.1. The last two cases of equality overlap: 
when n = 3 (and g = 1), both describe a ‘triangle’, For g > 1, the structure described 
in (c) is called a projective plane of order q. These planes will be considered further 
in Section 9.5. The first example (with q = 2) is the Steiner triple system of order 7, 
to be discussed in the next chapter. 


1 2 


Fig. 7.1. An extremal family 


Proor. First, we can suppose that every set in F has at least two points. For if the 
empty set is in F, it must be the only set. And if F contains a singleton, say {n}, 
then all the other sets of F contain n, and any two have just n in common; so there 
are at most n — 1 more sets, the extreme case being as described in (a). Also, we 
may assume that X g F; for if it were, there could be at most one further set in F, 
a singleton. 

The proof requires a trick. If |F] > n, then there is a subfamily of F with n 
members. We analyse this family, and show that no further set can be added without 
violating the hypothesis. So, for most of the proof, we can assume what we have to 
prove, viz., |F| = n. 

Let F = {A,,<.., An}. Moreover, for i = 1,...,n, let 

B; =X \ Ais 

ki = |Ail; 

ri the number of sets in F which contain i. 

(r: is called the replication number of the point i.) 
We claim that, if i ¢ Aj, then r; > k;. This is because each member of F containing 
4 meets A; in a unique point, and these points are all distinct. 

Next we claim that the sets B;,...,.B, satisfy Hall’s condition. Let J be a subset 
of {1,...,n}; then B(J) is the set of points not contained in A; for any j € J. If 
J = {7}, then B(J) = B; = X \ A; £ b, by assumption; so (HC) holds in this case. 
If 2 < |J| < n—1, then |B(J)| > n — 1 (for, if i,j € J, then every point except 
perhaps A; A; lies in B(J)). If |J| = n, the conclusion is clear. 

Thus there is a SDR for the family (B; : j = 1,...,7). If we choose the 
numbering so that 7 is a representative of B,, we have the conclusion that i ¢ A; for 
2=1,...,n. From our earlier observation, this means that 


i> ri 


for: =1,...,7. 
Now count pairs (i, A;) with ¿ € A;. Each point i lies in r; sets A;, and each set 
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A; contains k; points t. So we have 
n n 
Vr = VK. 
te1 j=l 


From these two equations, we conclude that r; = k; for i =1,...,n. 

Now, considering the proof that k; > r;, we see that equality implies that every 
point on A; lies on a member of F containing the point 2. 

But we can say more. 


Look again at the application of Hall’s Theorem, and ask: could a set J be 
critical? The proof shows that this can only happen if |J] = 1 or n — 1. If a set 
J = {7} is critical, this means that |B;| = 1, or |A;| = n — 1. If a set of size 
n — 1 is critical, this means that n — 1 of the sets A; pass through a fixed point. 
In either of these two situations, we must have conclusion (b) of the theorem. So 
we may suppose that they don’t occur, so that no set J is critical except J = § or 
J = {1,...,2}. Now the proof of Hall’s Theorem shows that we can take any set 
B; and choose any of its points as its representative. 

Now let z,y be two points of X. We aim to show that some member of F 
contains z and y. Suppose not. Choose the numbering of F so that A, contains y 
(but not x). Thus B, contains z, and we may use z as its representative. Now, as 
just shown, this means that every point of A; (in particular, y) lies on a set of F 
containing z. In other words: 


any two points lie on a unique member of F. 


Of course, this holds also in case (b) of the theorem. 

It now follows that there cannot be a set F with more than n members. For 
any n of them would satisfy the above. If A were an additional set, not a singleton, 
and z,y E A, then one of the first n sets (say A;) also contains z and y, and then 
A; and A have at least two common points. 

We also have the following condition (+): 

H the point i does not lie on the set Aj, then r; = kj; in other 
words, if r; Æ k;, then i € A;. 

Suppose that there are points x,y with r, # ry. Then each set of F contains at 
least one of x and y. If z is any further point, then we may suppose that r, Æ rz 
(interchanging z and y if neccessary), and so any set contains at least one of z and 
z. But only one set, say A, contains both y and z. So every set except A contains z. 
This forces the structure defined under (b). 

Thus, we may suppse that r+ is constant, say rs = qg +1. Now |A| = ¢+1 for all 
A € F, since for every set A there is a point x ¢ A, and (+) applies. Take a point z. 
Then g + 1 sets of F contain v, and each contains q further points of X; and there 
are no overlaps among these points. Thus n = 1 + (¢+1)¢ = 9° + q + 1, as claimed. 


7.4. Project: Regular families 


A family F of subsets of X is regular if every point lies in a constant number r of 
elements of F. It is interesting to ask questions of extremal set theory restricted to 
regular families. This section considers regular intersecting families. First, however, 
we show that regular families do exist! 


7.5. Exercises 


(7.4.1) Theorem. Let b, k,n,r be positive integers satisfying 


bke=nr, ken, b< (7). 


Then there is a regular family F of k-subsets of an n-set with |F| = b. 


Proor.? There is a simple way to make a family ‘more regular’. Let r, be the replication number of 
z, the number of sets of the family which contain z. If ry > ry, then there must exist a (k — 1)-set 
U, containing neither z nor y, such that {z} UU € F, {y} UU g F. Now form a new family F' 
by removing {2} UU from F and including {y} UU in its place. In the new family, r$ = rs — 1, 
r, = Ty + 1, and all other replication numbers are unaltered. Starting with any family of k-sets, we 
reach by this process a family in which all the replication numbers differ by at most 1 (an almost 
regular family), containing the same number of sets as the original family. 

But, by double counting, the average replication number is bk/n = r; and an almost regular 
family whose average replication number is an integer must be regular. 


This idea can be modified to prove a theorem of Brace and Daykin: 


(7.4.2) Theorem. If k is not a power of 2, and n = 2k, there exists a regular intersecting family of 


sise } (2) = (77) of k-subsets of an n-set. 


We begin with two remarks: 


REMARK 1. As we already saw in Section 7.1, an intersecting family of k-subsets of a 2k-set has size 


at most G55, with equality if and only if it contains one of each complementary pair of k-sets. 


REMARK 2, The replication number of a regular family as in the theorem is r = Lae 1), This is an 


integer if and only if k is not a power of 2 (Exercise 4). So the condition on k is necessary. 


We need a slightly more complicated version of the replacement procedure, in order to preserve 
the intersecting property. Let « and y be points with r, > ry +2. Then there are two disjoint 
(k — 1)-subsets U and V of X \ {z,y} such that {x} UU, {y} UU € F. The complements of these 
sets are {y} UV, {z} UV respectively, and are not in F. If we replace both of these sets by their 
complements, we obtain an intersecting family F’ in which r, = rz — 2, ry = ry + 2, and the 
other replication numbers are unaltered. Applying a sequence of such operations to an arbitrary 
intersecting family, we obtain a family in which the new replication differ by at most 2, and are 
congruent mod 2 to their initial values. 

Let F be any intersecting family of size (7*~'), in which all the replication numbers are 
congruent to ome) mod 2. [Let Fo be the family of all sets containing the point z. Its replication 
numbers are rz = A), Ty = (2-7) for y # x, which are all even. If a family in which all 
replication numbers are odd is required, replace a single set by its complement.) Now apply the 
above process. If a collection of numbers differ by at most 2, and all have the same parity as their 
(integral) average, then all the numbers are equal. 


7.5. Exercises 


1. Verify the claim in Example 2 of Section 7.1. 


2. If n = 2k, an intersecting family of k-subsets of an n-set has size at most 
a) = Gt), because it contains at most one of each complementary pair of k-sets. 
We proceed to generalise this result and argument. What follows could be regarded 


as a very simple version of the LYM technique. PROVE: 


2? This argument is due to David Billington. 
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Suppose that k divides n. Then an intersecting family F of k-subsets 
of an n-set X has size at most (771). 


(Hint: Let C be the set of all partitions of X into n/k subsets of size k. We don’t 
need to know |C| (though this could be counted), merely the fact that each k-set lies 
in |C|/ G members of C. Prove this by double-counting pairs (B,C), where B is 
a k-set and C € C with B a member of C’] 

Now double-count pairs (B,C), with B € F, C €C, and BE C, to obtain 


—1 
FeR) sera 


using the fact that, since the parts of a partition are disjoint, at most one of them 
lies in any intersecting family. 


3. (HARDER. PROBLEM). Prove that, if k divides n and n > 3k, then any intersecting 
family of size (z) of k-subsets of the n-set X consists of all k-sets containing some 


point of X. [HINT: it follows from the argument of Exercise 2 that, if F| = as 
then given any partition of X into disjoint k-sets, exactly one of these k-sets b ongs 
to F. Exploit this fact] 


4. Show that CA) is even if and only if k is not a power of 2. 


5. (a) If n is not a power of 2, construct a regular intersecting family of subsets of 
an n-set, having size 2772, 
(b) If n = 2,4 or 8, show that there is no such family. 


6, Prove that, in any intersecting family of size Ch) of k-subsets of a 2k-set, the 
replication numbers all have the same parity. 


7. Let F be any intersecting family of subsets of the n-set X. Show that there is an 
intersecting family F’ > F with |F'| = 27-1, [Hint: A blocking set for F is a set 
Y which meets every member of F but contains none. Adjoin to F all sets which 
contain a member of F, all blocking sets of size greater than in, and (if n is even) 
one of each complementary pair of blocking sets of size in] 

By proving that the Steiner triple system of order 7 has no blocking sets, give 
another proof of Exercise 1. 


8. Let F be a Sperner family of subsets of the n-set X. Define 6(F ) to be the family 
of all subsets Y of X such that 
(i) YNF £0 for all Fe F; 
(ii) Y is minimal subject to (i) (ic., no proper subset of Y satisfies (i). 
(a) Prove that D(F) is a Sperner family. 
(b) Show that, for any F € F and any y € F, there exists Y € 6(F) with YNF = {y}. 
(c) Deduce that b(b(#)) = F. 
(d) Let F; denote the Sperner family of all k-subsets of X. Prove that Fr) = 
Fnti-k for k > 0. What is (Fo)? 


8. Steiner triple systems 


... how did the Cambridge and Dublin Mathematical Journal, Vol. Il, p. 191 
[1846] manage to steal so much from ... Crelle's Journal, Vol. LVI, p. 326 
[1859], on exactly the same problem in combinations? 


T. P. Kirkman (1887) 


TOPICS: Steiner triple systems; packings and coverings; [tournament 
schedules; finite geometries] 


TECHNIQUES: Direct and recursive constructions; [use of linear 
algebra and finite fields for constructions] 


ALGORITHMS: 


CROSS-REFERENCES: Extremal set theory (Chapter 7); [finite fields 
(Chapter 4); finite geometry (Chapter 9)] 


This chapter is devoted to the proof of existence of Steiner triple systems. The topic is 
somewhat specialised; but the technique, involving a mixture of direct and recursive 
constructions (the latter building up large objects of some type from smaller ones) 
is of very wide applicability. 


8.1. Steiner systems 


In 1845, the following problem in extremal set theory was posed in an unlikely 
forum, the Lady’s and Gentleman's Diary: 


Given integers l m,n with l< m < n, what is the greatest number 
of m-element subsets of an n-element set with the property that 
any l-element subset lies in at most one of the chosen sets? 


The problem proved too difficult for the journal’s readership, and so it was specialised 
to the case [| = 2, m = 3. This provided the incentive for a 40-year-old Lancashire 
vicar, T. P. Kirkman, to take up mathematics: his first published paper, the following 
year, contained a contribution to this case.’ 

Returning to the general problem for a moment, we observe: 


l Kirkman is now remembered almost entirely for his work on this problem, but he also wrote 
extensively on projective geometry, groups, polyhedra, and knots, and was regarded as one of the 
leading British mathematicians of his day. An account of his life and work can be found in the 
article ‘T. P. Kirkman: Mathematician’ by Norman Biggs in the Bulletin of the London Mathematical 
Society 13 (1981), 97-120. 
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(8.1.1) Proposition. Let B be a family of m-subsets of an n-set, such that any l-set 
Bes in at most one member of B. Then 


|B] < o) 


Equality holds if and only if any l-subset lies in exactly one member of B. 


Proor. We count pairs (L, B), where L is an l-set and B € B with L C B. Each 
B € B contains (7) subsets of size J, so there are |B|. (7) pairs. On the other hand, 


there are altogether (7) subsets of size /, and each lies in at most one set B, so there 


are at most (7) pairs. Thus 


Equality is only possible if every l-set lies in a (unique) member of B. 

A pair (X,8), where X is an n-set and B a family of m-subsets satisfying the 
hypotheses of the proposition and attaining the bound is called a Steiner system 
S(i,m,n).? A very important specialisation of the above problem is the following: 


For which values of l, m,n does a Steiner system S(l,m,n) exist? 


A Steiner system 5(2, 3, n) is called a Steiner triple system. To reiterate: a Steiner 
triple system consists of a set X of points and a set B of 3-element subsets of X 
(called triples or blocks), with the property that any two points of X lie in a unique 
triple. The number n is called the order of the Steiner triple system. In this chapter, 
we settle the existence question for Steiner triple systems. I will usually abbreviate 
‘Steiner triple system’ to STS, and write STS(n) for a Steiner triple system of order 
n. 


First, some examples. 


Fig. 8.1. Two small Steiner triple systems 


2 The name is a double misnomer: the question posed by Steiner was not equivalent to the existence 
of Steiner systems, though they are the same in the special case 1 = 2, m = 3; and this special 
case was settled by Kirkman seven years before the question was asked by Steiner. However, the 
terminology is now standard, and the term ‘Kirkman system! has a different meaning. 


8.1. Steiner systems 


Fig. 8.1(a) shows a STS(7). More formally, (X, 8) is a STS(7), where 
X = {1,2,3,4,5,6, 7} 
B = {123, 145, 167, 246, 257, 347, 356}. 


(We write 123 for the set {1,2,3}, and so on.) 
Fig. 8.1(b) shows a STS(9}. Note that it solves the ‘nine schoolgirls’ problem 
posed in Chapter 1: the walking scheme is 
Day 1: 123 456 789 
Day 2: 147 258 369 
Day 3: 159 267 348 
Day 4: 357 «168 249 


Moreover, there are trivial Steiner triple systems of orders 3 (three points forming 
a triple), 1 (a single point, no triples), and 0 (no points or triples). Before reading 
further, show that there is no Steiner triple system of order 2, 4, 5 or 6. 

The next theorem determines completely the possible orders of Steiner triple 
systems. 


(8.1.2) Theorem. These exists a Steiner triple system of order n if and only if either 
e n=0; or 
e n=1or3 (mod 6). 


This theorem asserts that a numerical condition is necessary and sufficient for 
the existence of something. So the proof has two parts. First, we must show that the 
order of a Steiner triple system satisfies the constraint: the argument is given below. 
Second, given a number n of the correct form, we have to construct a STS(n). This 
is more difficult, and will take the next two sections. 


PROOF OF NECESSITY. Suppose that (X, 8B) is a STS of order n. Clearly, we may 
suppose that n > 0. We establish two important properties by ‘double counting’. 
1. Any point lies in (n — 1)/2 triples. 

Choose a point z, and count pairs (y, B), where y is a point different from z, 
and B a triple containing x and y. First, there are n — 1 choices for y and, for each 
choice, there is a unique triple containing z and y: altogether n — 1 pairs. Second, 
if z lies in r triples, then (since each triple contains two points other than z) there 
are 2r choices of the pair (y, B). Hence 2r = n — 1, and r is as claimed. 

2. There are n(n — 1)/6 triples altogether. 

We count pairs (x, B), where z is a point and B a triple containing z. Each 
of the n points lies in (n — 1)/2 triples, so there are n(n — 1)/2 pairs. If there are 
b triples, each containing 3 points, then there are 3b choices. So 3b = n(n — 1)/2, 
giving the claimed value for b. 

Now the necessity of the condition follows. For, if n > 0, then both (n — 1)/2 
and n(n — 1)/6 must be whole numbers. The first condition asserts that n is odd, 
whence n = 1,3 or 5 (mod 6). Suppose that n =5 (mod 6), say n = 64 + 5. Then 
the number of triples is 


n(n — 1)/6 = (6k + 5)(3k + 2)/3; 
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but this is not an integer, since neither 64 +5 nor 3k + 2 is a multiple of 3. So n 
must be congruent to 1 or 3 modulo 6. 


Note that, if n = 1 or 3 (mod 6), then both (n—1)/2 and n(n —1)/6 are integers; 
but this in itself is no guarantee that a STS(7) exists, of course, 


8.2. A direct construction 


The proof of sufficiency given in the next section involves a recursive construction, in 
which larger Steiner triple systems are built up from smaller ones. In this section, we 
show that a direct construction can be used to prove half of the theorem. Specifically: 


(8.2.1) Proposition. If n = 3 (mod 6), there exists a STS(n). 


Suppose that n = 3 (mod 6); then n = 3m where m is odd. The point set is 
made up of three copies of the integers mod m. Formally, 


A= {a;, &, ci iE Z/(m)}. 


Blocks are of two types: 
(a) all sets of the form aiajbk, b;bjCk, OF CGiCjap, where i,j,k € Z/(m), i F j, and 

i+ j = 2k (in Z/(m)); 

(b) all sets of the form a;bici, for i € Z/(m). 

Before verifying that this works, observe that the equation i + j = 2k has a 
unique solution (in Z/(m)) for any one of the variables, given the other two. This 
is clear for i and j. For k it depends on the fact that (since m is odd) any element 
of Z/(m) can be uniquely divided by 2: depending on the parity of I, either 1/2 or 
(t+ m)/2 is the unique solution of 22 = L 

First let us count the triples. There are (3) = m(m — 1)/2 choices of ¢ and j, 
and for each choice, a unique k and hence three triples of the first type. There are 
clearly m triples of the second type. This makes altogether 


3m(m — 1)/2 +m = 3m(38m — 1)/6 


triples, as required. Now let us verify that they do form a Steiner triple system, by 
showing that any two points lie in a unique triple. 

There are several cases: 

(i) Points a; and aj, i Æ j. A triple containing them must be of type (a); by our 
remark above, there is a unique such triple. 

(ii) Points b; and b;, or c; and cj: these cases are similar. 

(iii) Points a; and 6;. These lie in a unique triple of type (b); and in no triple of 
type (a), since if a;ajb; were such a triple, then i + j = 2i, whence 7 = 2. 

(iv) Points b; and c;, or c; and a;: similarly. 

(v) Points a; and bp, k # i: these lie in a unique block of type (a). 

(vi) Points b; and cx, or ci and ay: similarly. 


For n = 9, we get a different-looking STS of order 9. In fact it turns out to be 
the same as before, just drawn differently (see Fig. 8.2, in which three triples are not 


8.3. A recursive construction 


co 


Fig. 8.2. STS(9) 


8.3. A recursive construction 


Before embarking on the main business, we attend to one important detail: the 
construction of a STS(13). 

For this, it would be sufficient to give a list of 13 points and 13.12/6 = 26 triples, 
and leave the verification to the reader. However, the construction is a special case 
of something more general, so we give it abstractly. 

We take X to be the set Z/(13) of residue classes modulo 13. Consider first the 
sets 

B, = {0,1,4}, Ba = {0,2,8}. 
We claim that the following holds. 


For any non-zero z € X, there is a unique way to write z = z — y 
with x,y chosen from the same set B;. 


This is seen by listing all possibilities: 


0 2=2-0 
8 6=8-2 
4 10=1-4 


1 
0 
0 


and noting that each of the 2-3-2 = 12 expressions u — v for u,v € B;, i = 1,2 has 
been used once. 


Now let 
B={B,+2,Be+z:2€X}, 


where B; + z = {t + z : t € B;}. (So B consists of the triples 014, 125, 236, ..., 028, 
139, ... ; 26 in all.) 

We claim that (X, B) is a STS. Clearly X is a 13-set and B a set of 3-subsets of 
X. Let z,y be distinct points of X. If z,y € B; + z, then z — z,y — z € B;, and 
(2 — z) — (y — z) = x — y. By the claim above, there is a unique choice of i, u,v so 
that z — y = u — v with u,v € B;; and then z — z = u, so z = g — u = y — vis also 
determined. So there is a unique triple containing z and y. 

This technique works whenever we can find sets B,,...in Z/(n) such that any 
non-zero element of Z/(n) can be written uniquely in the form u — v, with u and v 
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chosen from the same B;. For example, with n = 7, it is possible to use a single set 
B, = {0,1,3}, giving tise to the familiar STS(7) labelled in a different way (Fig. 8.3). 


1 


6 0 
Fig. 8.3. STS(7) in cyclic form 


This construction works much more generally. We use it for all primes n (indeed, 
all prime powers) which are congruent to 1 (mod 6) in a Project (Section 8.5). 


Now we come to the main technical result. This is a recursive construction, 
building larger systems from smaller ones. First, a definition. 


A subsystem of the Steiner triple system (X,8) is a subset Y of X with the 
property that any triple in B which contains two points of Y is contained within Y. 
If C is the set of triples contained within the subsystem Y, then (Y,C) is a Steiner 
triple system in its own right, and we may refer to this as a subsystem without 
confusion. 


(8.8.1) Proposition. Suppose that there exists a STS of order v containing a subsystem 
of order u, and also there exists a STS of order w. Then there exists a STS of order 
u + w(v — u). If w > 0, it contains a copy of the STS(v) as a subsystem. Moreover, 
if0 < u <v and w > 0, then it can be assumed to have a subsystem of order 7. 


EXAMPLE. Given this result, we can give two constructions of a STS of order 19 (the 
smallest value for which we haven't yet constructed a STS). In the proposition, take 
either 

eusl,v=3,w = 9 (19 = 1 + 9(3 — 1)); or 

e u=1,v =7,w = 3 (19 =1 +3(7-1)). 


The idea behind the construction is described like this. Imagine that the STS(v) 
is drawn on a piece of paper, with the points of the STS(u) on the left-hand side. 
Make w copies of this page. Now bind them into a book by glueing them together 
on the left, so that the points of the STS(u) on the different pages become identified 
(and lie on the spine of the book). We have u + w(v — u) points, as required. 
Moreover, we have some triples already, all those lying on a single page of the 
book (possibly using points of the spine). Any further triple uses three points from 
different pages. We use the STS(w) to help us choose these triples; so we imagine 
that the pages are numbered by its points. 

Formally, then, let the point set of the STS(v) be {a1,...,a,}U {b; : i € Z/(m)}, 
where m = v — u, and the points {a;,...,¢,} form the STS(u). Let the points of the 
STS(w) be {ci,..., cw}. Take 


z = {a,...,a.}U {d,;:p=1,...,w;i € Z/(m)}. 


8.3. A recursive construction 


The blocks are of two types: 

(a) the blocks of the STS(v), copied onto each ‘page’ (each set consisting of all the 
a; and all the dp; with fixed p) by the mapping that fixes all a; and maps b; to 
dpi} 

(b) all sets of the form dy, 3, dp. ,i.¢p3,i, for which the ‘page numbers’ cp, ,Cp,, Cp, form 
a triple of the STS(w) and ¢, + 72 +73 =0 fin Z/(m)). 

Let us check that it works. Take two points. If they lie on the same page, 
then they lie in a unique triple of the first type, by the defining property of the 
STS(v). If they lie on different pages, then they have the form d,,,;, and Gy igs where 
pı Æ po. Then the third point on the triple must have the form dp,,;,; ps is uniquely 
determined by the requirement that cp,cp,¢,, is a triple of the STS(w), and i3 by the 
requirement that 7, + 72 + 23 = 0. 

It remains to show the last part, about the subsystem of order 7. Suppose that 
0O<u<v and w > l. Since u > 0, we may take a point a of the subsystem. Since 
v > u, we have m = v — u even; choose the numbering of the points b; so that 
abobmy2 is a triple (otherwise it is arbitrary). Since w > 1, there is a triple in the 
STS(w), say cicacg. Now it is easily checked that the seven points 


{a} U {dpi : p = 1,2,3;4 = 0,m/2} 


form a subsystem (see Fig. 8.4). 


Fig. 8.4. A STS(7) subsystem 


Now let A be the set of positive integers n for which there is a Steiner triple 
system of order n — we have to show that A contains all numbers n = 1 or 3 (mod 
6). Also, we let B be the set of positive integers n for which there exists a Steiner 
triple system of order n containing a subsystem of order 7. 
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We note that B is a subset of A. Also, the following implications hold: 


nEA>meEeA 
nEB>3nEB 
nEAS>Zn-2E€B 
nEA>3n—-6EB 
nE B> 3n-l4EB 


These are justified by (8.3.1), with the following values of (v, v, w): 


We divide the potential members of B into congruence classes modulo 18: any 
admissible n is congruent to 1, 3, 7, 9, 13 or 15 (mod 18). Now we have 
6k +1 E€ A> 18k 4+1=3(66 +1)-2€E8B 
6k +3 € A> 18k 43 = 3(6k +3)-6EB 
6k +3 € A = 18k +7 = 3(6k4+3)-2EB 
6k +3 € B => 18k +9 = 3(6k +3) € B 
6k +9 € B= 18k +13 = 3(6k +9) -14€ B 
6k +7 € A> 18k +15 = 36k +7)-6EB 
We claim that every admissible number r > 15 lies in B. Suppose not, and take 
a least counterexample. If n = 1 (mod 18), then we must have n < 55 — for if 
18k +1 > 55, then 6k +1 > 19, so 6k +1 € B (since 6k + 1 is at least 15 and is 
smaller than the least counterexample), and 18k + 1 € B also. So n = 19 or 37. 
Checking the other congruence classes this way, we find the possible values of n to 
be 15, 19, 21, 25, 27, 33, 37. So the claim will be proved if we can show that each 
of these numbers is in B. Suitable values of (u,v, w) in (8.3.1), with the relevant 
equation, are given in the following table. 
(1,3,7) 15=14+7(3-1) 
(1, 3,9) 19 =1+9(3-1) 
(0, 7,3) 21=0+43(7-0) 
(1,9, 3) 25 =14+3(9-1) 
(1,3, 13) 27 =14+13(3-1) 
(3, 13,3) 33 = 3 + 3(13 — 3) 
(1,13, 3) 37 = 14 3(13 — 1) 
We use the fact that 7,9,13 € A, as established earlier. 
So B does contain all admissible n > 15. Since B C A, and since A contains 1, 
3, 7, 9 and 13, the theorem is proved. 
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The proof of the theorem is constructive: given a number n = 1 or 3 (mod 6), 
if n is sufficiently large, then we can read off from the proof a number n’ such that 
an STS(n’) can be used to construct an STS(n). For example, how to construct a 
STS(625)? Since 625 = 18 - 34 + 13, and 6 - 34 + 9 = 213, we require a STS(213). 
Then, since 213 = 18-114 15, and 6 -11 + 7 = 73, we need a STS(73). Then, since 
73 = 18-441, and6-4+1 = 25, we need a STS(25). Then the recursion ‘bottoms 
out’, since the proof tells us how to construct this system. 

The construction given here is by no means the only one possible. Exercises 2 
and 13 yield a completely different STS(625). 


This is not the end of the story — one can ask how many different ways there 
are of forming a Steiner triple system on a set of n points, where n = 1 or 3 (mod 
6). But we now pursue a different question: if n is not of this form, how close can 
we get to a STS? 


8.4. Packing and covering 


Steiner triple systems represent special solutions to an extremal set problem — 
indeed, to two such problems, as we now discuss. This situation, where a structure 
satisfying a condition containing the words ‘exactly one’ is an extreme case for both 
‘at most one’ and ‘at least one’, is very common; the extremal problems are referred 
to as packing and covering problems. 

Let X be a set with n elements. A (2,3)-packing is a set B of triples such that any 
two points of X are contained in at most one member of B; and a (2,3)-covering is 
a set B of triples such that any two points are contained in at least one member of 
B. Obyiously any subset of a packing is a packing, and any superset of a covering 
is a covering; so we let p(n) denote the size of the largest (2,3)-packing, and e(n) 
the size of the smallest (2,3)-covering, of an n-set. 


(8.4.1) Proposition. (a) p(n) < n(n — 1)/6. 
(b) cln) > n(n — 1)/6. 
(c) Equality holds in either bound if and only if there exists a STS(n). 


PRooF. The arguments are straight double counting. For packings, each of the 
n(n — 1)/2 pairs is contained in at most one triple, and each of the p(n) triples 
contains exactly three pairs. For coverings, the inequality reverses. 


Thus, if n = 1 or 3 (mod 6), we have p(n) = e(n) = n(n —1)/6. For other values, 
p(n) is smaller than this bound, and e(n) is larger. It is possible to prove a general 
result improving the inequalities: 


(8.4.2) Proposition. (a) p(n) < |2("5+]]- 
(b) e(n) > (3/411. 


Proof. We follow the argument for the necessary condition for the existence of a 
STS(n) (8.1.2). Let B be a packing. Then, by double counting, any point z lies in 
at most ®>" triples of B. However, the number of triples containing x is an integer, 
so we can round this number down to |25} |. Then, again by double counting, the 
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number of triples is at most 3|5+|; and, again, we can round this number down. 
The argument for coverings is similar, except that we round up. 


These bounds are not always attained. But there is one case where they are met: 
(8.4.8) Proposition. If n =0 or 2 (mod 6), then p(n) = n(n — 2)/6. 


PROOF. n is even, so [">1| = 252, Then me-a is an integer, so this quantity is our 
upper bound for p(n). 

On the other hand, there exists a Steiner triple system of order n + 1, since this 
number is congruent to 1 or 3 (mod 6). This STS has “44” blocks, of which each 
point lies in $. So, if we remove one point and all triples containing it, we obtain a 
packing of size 

(n+1)n n_ n(n—2) 
6 2 6 ` 


8.5. Project: Some special Steiner triple systems 


This section describes some constructions of Steiner triple systems by algebraic, 
rather than combinatorial, methods. The resulting systems have a high degree of 
symmetry. 

PROJECTIVE TRIPLE SYSTEMS. 

In this subsection and the next, we construct examples of highly symmetric Steiner triple systems, 


using linear algebra over the fields Z/(2) and Z/(3). These systems are instances of more general 
finite geometries’, to be treated in Chapter 9. 


Let F be the field Z/(2) of order 2. Let V be a vector space of dimension d over F, Then V 
can be realised concretely as the set of all d-tuples of elements of F, so that |V| = 2%. We take X to 
be the set of non-zero vectors in V, and 


B = {{z,y, 2} : z, y, z distinct, z + y + z = 0}. 


CLAM. (X, 8) is a Steiner triple system of order 2¢ — 1. 


Proor. It’s clear that, if z + y + z = 0, then any two of z, y, z determine the third. We have to show 
that, if z and y are distinct and non-zero, then z is distinct from both and non-zero. So suppose that 
0 Ææ Æ y0. Then z = —(2+ y) = 2+ y (since —1 = 1 in F ). Since y £ 0, we have z £ z; since 
z #0, we have z £ y; and since z Æ y, we have z = £ +y = t — y £ 0- 


We denote this system by P(d — 1); it is called a projective triple system or projective geometry 
of dimension d — 1 over F. (There are geometric reasons for letting the dimension be d — 1 rather 
than d; these will appear later.) Fig. 8.5 shows the familiar STS(7) presented as P(2). 


(010) (o1) (001) 
Fig. 8.5. P(2) 


Projective systems have an important, and characteristic, property. A triangle in a STS is a set 
of three points not forming a triple. 
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(8.5.1) Theorem. A STS is projective if and only if any triangle is contained in a subsystem of order 7. 


Proor. Let (X,8) be a projective triple system, and {z,y,z} a triangle. Then z+ y+ z 0; so 
the seven points £, Y, z, 2 +Y, Y +7, z +2, £ +y +z are all distinct and are easily seen to form a 


subsystem. 


For the converse, let (X, 8) be a STS in which every triangle is contained in a 7-point subsystem. 
We have to construct the algebraic structure of a vector space over Z/(2). This is an example of the 


procedure of ‘coordinatisation’ in geometry. , 
Let 0 be a symbol not in X, and let V = X U {0}. We define an operation + on V by the rules 


that, for all v € V, 
O+fv=v40=4, 


v+v=0, 


and, if z, y E€ X with z £ y, then z + y is the third point of the triple containing z and yo 

This operation is obviously commutative; 0 is the identity, and every element is its own inverse. 
We show that it is associative. There are several cases, most of which are trivial (for example, 
(£ +0) +y =z +y = z + (0+ y)). The only non-trivial case occurs when {z,y, z} is a triangle, in 
which case the structure of the STS(7) gives the required conclusion (see Fig. 8.6). 


T' 


(cty)+2=2+(y+2) 


Y ytz z 
Fig. 8.6. The associative law 


We conclude that 
© (V,+) is an abelian group. 
Next we define a scalar multiplication on V, by elements of F, by the rules 


= 0, 


for all v € V. We have 
(V, +,-) is a vector space over Z/(2). 
Again, most of the axioms are trivial. The most interesting is 
(a+b) vza utp v. 


In the case a = ĝ = 1, we have a + 6 = 0, and the result follows from the fact that v+2e=0. 
Now X is the set of non-zero vectors, and B the set of triples with sum 0, in V; so the system 


is projective. 


AFFINE TRIPLE SYSTEMS, 


There is a similar construction involving the field Z/(3). Let V be a d-dimensional vector space over 
this field. Let X = V, and 


B = {{z,y,2} CX :a,y,z distinct, x + y+ z= 0}. 


Cuamm. (X, B) is a Steiner triple system of order 32, 
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Proor. Again, if z + y +z = 0, then any two of x,y,z determine the third. Suppose that z £y. 
Then z # z, since if z = z = —(2 + y) then y = —2g = 2; and similarly z £ y, so all three points are 
distinct. 

This system is called an affine triple system or affine geometry of dimension d over Z/(3). (Note 
the dimension!) It has a property resembling that of projective triple systema: 


In an affine triple system, any triangle is contained in a subsystem of order 9. 


(See Exercise 5.) 

The converse, surprisingly, is false. The first counterexample has order 81, and was constructed 
by Marshall Hall. As a result, the term Hall triple system is used for any Steiner triple system which 
is not affine but has the property that every triangle is contained in a subsystem of order 9. It is 
known that the order of a Hall triple system must be a power of 3, and that they exist for all orders 
which are powers of 3 and at least 81. 

Nobody knows any example of a Steiner triple system of order n in which each triangle lies in 
a unique subsystem of order k < n, for any k other than 7 or 9. 


NETTO SYSTEMS. 
These Steiner triple systems are constructed using the method we saw already for the STS(13). 


(8.5.2) Proposition. Let B,,...,B, be 3-subsets of Z/(n). Suppose that, for any non-zero element 
u € Z/(n), there is a unique value of i € {1,...,¢} and unique x,y € B; such that u = z — y. Set 


B={B;:+z:1<i<t,z€Z/(n)), 
where B; + z = {b + z : b € B;}. Then (Z/(n), 8) is a Steiner triple system. 


Proor. Take two distinct elements z, y € Z/(n); we have to show that a unique triple in 8 contains 
z and y. When do we have z,y € B; +z? This condition implies that z — z,y — z € Bi; and 
(t-z)—(y—2z) = z — y #0. So, given z and y, there is a unique choice of ï; and the elements x — z 
and y — z (and hence z) are also determined. 


Note that the number of triples is tn = n(n — 1)/6; so n = 6t + 1, or n = 1 (mod 6). Note also 
that the cyclic permutation z+ 2+ 1 (mod n) preserves the Steiner triple system. 


We will see that, for any prime number p = | (mod 6), there exist sets B),..., By satisfying the 
hypothesis of (8.5.2). For this, we use the following fact: 


If p =1 (mod 6), then the field Z/(p) contains a primitive sixth root of unity 
(an element z satisfying 2° = 1, z* #1 for 0 < k < 6). 


The algebraic explanation of this fact is that the multiplicative group of Z/(p) is a cyclic group of 
order p — 1, and so (if 6[p— 1) contains a cyclic subgroup of order 6. 
Since 0 = 2° - 1 = (z? — 1)(z + 1)(2? — z + 1), and 27 Æ 1, z £ —1, we have 2? — z + 1 = 0. We 
note the equations 
l=1-0, z=2-0, z =z=1, z2=0-1, =0-2, 2=1-2. 


Now set ¢ = (p — 1)/6, and let s,,..., S; be coset representatives for the distinct cosets of 
the subgroup generated by z in the multiplicative group of Z/(p). Then let B; = {0, ŝi, 32} for 
f= 1,...,t. Then every non-zero residue mod p is uniquely expressible in the form siz’, where 
1<i<tand 0 <j < 5. According to the displayed equations, it is uniquely expressible in the form 
x — y for some z, y € B; and some t. This proves the claim. 


The STS we have constructed is called a Netto system of order p, denoted by N (p). 


The construction can be generalised, using finite fields. In Section 4.7, we briefly discussed the 
theorem of Galois, guaranteeing a unique field GF(q) of any prime power order q. It is also true 
that the multiplicative group of GF(q) is cyclic. So the construction of a Netto system N (g) of prime 
power order g = 1 (mod 6) works exactly as for prime order p, with GF(q) replacing Z/(p) in the 
construction, See Exercise 2 for an example of this. 
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8.6. Project: Tournaments and Kirkman’s schoolgirls 


In this section, we construct Kirkman’s own solution to his Schoolgirls Problem. 


We begin with a detour. The schoolgirls enjoy playing hockey, and the school has a team in 
a league, playing matches against other school teams at weekends during term. In the course of a 
season, every team plays against every other team once. If there are n teams in the competition, what 
is the least number of rounds required to play all the matches? 

The number of matches to be played is (3) = n(n — 1)/2. If n is even, then n/2 matches can be 
played in every round, so we need (at least) n — 1 rounds. If n is odd, then only {n — 1)/2 matches 
can be played in a round, with one team having a bye; so n rounds are required. A tournament 
schedule for n teams is an arrangement of all pairs of teams into the minimum numbers of rounds 
just calculated (viz. n — 1 if n is even, n if n is odd). 

Of course, we cannot guarantee that tournament schedules exist on the basis of this argument; 
but there is a simple construction, as follows. First, consider the case where n is odd. Draw a regular 
n-gon in the plane, and number its vertices 0,..., n — 1 corresponding to the teama (these numbers 
are regarded as belongong to the integers mod n). For each edge of the n-gon, there are (n — 3)/2 
diagonals parallel to this edge; this parallel class determines the matches in a round, with the team 
corresponding to the vertex opposite the chosen edge having a bye. Fig. 8.7 shows the case n = 5. 


00000000000 


Fig. 8.7. Tournament schedule: five teams 


This construction can be presented algebraically: the edge and diagonals in the parallel class 
not containing the vertex i have the form {j,k}, where j + k = 2i (in Z/(n)). 

For n even, we temporarily remove one team from the competition, and construct a tournament 
schedule with nz — 1 rounds for the remaining teams as above. Then we decree that, in each round, 
the extra team will play the team which would otherwise have had a bye in that round. 

Now we present Kirkman’s marching orders for his schoolgirls. First we construct a STS(15). 
Divide the 15 schoolgirls into a group X of 7 girls and a group Y of 8. We take X = {zo,..., 26} to 
be the point set of a Steiner triple system STS(7). Also, we take Y to ‘be’ the teams in a tournament 
with 7 rounds Ro,i..,&s. Each R; consists of four disjoint pairs of girls; we add girl z; to each of 
these pairs to form a triple. In this way we get 28 triples which, together with the 7 triples of the 
STS(7), form 35 triples, the right number for a STS(15). 

We check that it really is a STS(15). Any two girls in X belong to a unique triple of the 
subsystem. Any two girls in Y form a pair belonging to one round K; of the tournament, and so lie 
in a triple with 2. Finally, take a girl in X (say z;) and a girl y € Y: the unique triple containing 
them is {z;, y, y'}, where {y, y} belongs to round Ri. 

Finally, we have to divide the triples into seven sets of five, corresponding to the walking 
groups on the seven days of the week. For this, we exploit the cyclic structure of both the STS(7) 
and the tournament schedule. We can take the triples of the STS(7) to be {Bo,..., Bg}, where 
Bi = {£ii 142, tia}. Label the girls in Y as {yo,...,ye,2}, where the i” round R; of the 
tournament consists of {y:,z} and all {y;, ye} with j + k = 2i. Then {zo,y0,z}, {ya y6: cs}, 
{ye, vs, 26}, and {y1, ys, £3} are triples (since, for example, 4 + 6 = 2 x 5). Together with {z1, 22, 4}, 
these make up the groups for day 0: every girl is in one group. Now the groups for day are obtained 
by adding ¢ to the subscripts of the z's and y's. 
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In general, a Steiner triple system, whose triples can be partitioned into classes 
with the property that each point lies in a unique triple of every class, is called a 
Kirkman system. 


8.7. Exercises 


1. Kirkman’s original (incomplete, but basically correct) proof of the existence of 
Steiner triple systems went as follows. Kirkman defined two kinds of structure: Sn, 
what we have called a Steiner triple system of order n; and S}, whose exact details 
don’t concern us here. He claimed to show: 

(a) Sı exists; 

(b) if Sn exists, then 52,41 exists; 

(c) if S, exists and n > 1, then 5)_, exists; 

(d) if S, exists, then Sm-1 exists. 

Prove that, from (a)—(d), it follows that S,, exists for all positive integers n = 1 or 3 
(mod 6). For which values of n does S, exist? 

2. Construct a Netto system of order 25. 

(Hint: As in Section 4.7, we have to find an irreducible quadratic over Z/ (5); 
use it to construct GF(25), and then find a primitive sixth root of unity in this field. 
But all this can be simplified. We know that z must satisfy z? — z + 1 = 0, and this 
quadratic is irreducible over Z/(5); so let 

GF(25) = {a + bz : a,b € Z/(5)}, 
where z? = z—1. All that remains is to find the coset representatives of the subgroup 
generated by z.] 
3. Prove that, given any STS(7), its points can be numbered 1,...,7 so that its 
triples are those listed in Fig. 8.1(a). Prove a similar statement for STS(9). 
[Hint: show that any two triples of a STS(7) must meet; while, in a STS(9), there 
are just two triples disjoint from a given triple, and these are disjoint from one 
another.] 

Formally, an isomorphism between Steiner triple systems (X1, B1) and (X2, B2) 
is a bijective map f : Xı — Xz which carries the triples in B, to those in B2. You 
are asked to prove that Steiner triple systems of orders 7 and 9 are unique up to 
isomorphism. 

HARDER PROBLEM. Prove that there are just two non-isomorphic Steiner triple 
systems of order 13. 

REMARK. After this, things get more difficult. There are exactly 80 non-isomorphic 
STS of order 15, and millions of non-isomorphic STS(19) (the exact number has 
never been determined). 


4. An automorphism of a Steiner triple system is an isomorphism from the system to 
itself. Prove that a Steiner triple system of order 7 or 9 has 168 or 432 automorphisms 
respectively. 
5. (a) Prove that, in an affine triple system, each triangle lies in a subsystem of 
order 9. 

(b) Prove that an affine triple system is a Kirkman system. 
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6. Verify the following values of the packing and covering functions for small n. 


n 3 4 5 6 7 8 9 
p(n) 1 1 2 7 8 r 
c(n) 1 3 4 6 7 11 12 


EXERCISES ON STEINER QUADRUPLE SYSTEMS. 


A Steiner quadruple system (SQS) is a pair (X, B), where X is a set, and B a collection 
of 4-element subsets of X called quadruples, with the property that any three points 
of X are contained in a unique quadruple. The number n = |X| is called the order 
of the quadruple system. 


7. If a SQS of order n exists, with n > 2, then n = 2 or 4 (mod 6). 
[This condition is also sufficient, but the proof is more difficult.] 


8. If (X, B) is a SQS of order n, then |B| = n(n — 1)(n — 2) /24. 


9. Let X be a vector space over Z/(2), and let B be the set of 4-subsets {z, y, z, w} 
of X for which z + y+ z+ w = 0. Show that (X, B) is a SQS. 


10. Let (X, B) be a SQS of order n > 2. Take a disjoint copy ( X’, B’) of this system. 
Take a tournament schedule on X with rounds 7),...,R,-1, and one on X’ with 
rounds #i,...,2t',_,. (This is possible since n is even — see Section 8.6.) Now let 
Y = XU X', and C = BU B'U R, where R is the set of 4-sets {x, y, z’, w} such that 
e zy EX, z w eX, 
e for some i (1 <i <n — 1), {z,y} € Ri and {z’,w’} € Ri 
Show that (Y,C) is a SQS of order 2n. 


EXERCISES ON SUBSYSTEMS. 


11. Let (X, B) be a STS of order n, and Y a subsystem of order m, where m < n. 
Prove that n > 2m + 1. Show further that n = 2m + 1 if and only if every triple in 
B contains either 1 or 3 points of Y. 


12. Let (X, B) be a STS of order n = 2m + 1, and Y a subsystem of order m; say 
Y = {y1,.--,; Ym} For i =1,...,m, let R; be the set of all pairs {z, z'} C X \ Y for 
which {y;,2z,2'} € B. Show that {R;,..., Rm} is a tournament schedule on X \ Y. 

Show further that this construction can be reversed: a STS(m) and a tournament 
schedule of order m + 1 can be used to build a STS(2m + 1). 


13. Let (X,8) and (Y,C) be STS, of orders m and n respectively, Let Z = X xY, 
and let D consist of all triples of the following types: 

e {(2,y1), (x, y2), (a, ys)} for z € X, {yn yzy} E C; 

© {(z1, y) (22y) (v3, y)} for {21,22,23} E€ B, y E Y; 

© {(21, y1) (v2, y2), (23, y3)} for {21, 22,23} € B, {y1,y2, ys} EC. 
(Note that a triple in B and one in C give rise to six triples of the third type, 
corresponding to the six possible bijections from one to the other.) 

Show that (Z, D) is a STS of order mn. Show further that, if m > 1 and n > 1, 

then (Z, D) contains a subsystem of order 9. 
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14, What can you say about the set 
{n: there exists a STS(n) with a subsystem of order 9}? 


15. COMPUTING PROJECT. Recall the ‘nine schoolgirls problem’ posed in Chapter 1: 
nine schoolgirls are to walk, each day in sets of three, for four days, so that each pair 
of girls walks together once. We've seen that this problem has a unique solution: 
there is a unique STS(9) up to isomorphism (Exercise 4), and there is a unique way 
of partitioning its twelve triples into four sets of three with the required property. 
Now we add a further twist to the problem: 


Arrange walks for the girls for twenty-eight days (divided into seven 
groups of four) so that 

o in each group of four days, any two girls walk together once; 

in the entire month, any three girls walk together once. 


In other words, we are asked to partition the (3) = 84 triples of girls into seven 
12-sets, each of which forms a Steiner triple system. 

There are 840 different Steiner triple systems on a given 9-set,? and so potentially 
(0) possibilities to check — rather a large number! We make one simplifying 
assumption. (This means that, if we fail to find a solution, we have not demonstrated 
that no solution exists.) We assume that 

the required seven STS(9)s can be obtained by applying all powers 
of a permutation 6 of order 7 to a given one. 


We can assume that the starting system is the one of Fig. 8.1(b), with point set 
X = {1,...,9}, and triple set 
B = (123, 456, 789, 147, 258, 369, 159, 267, 348, 357, 168, 249}. 


We can also assume, without loss, that @ fixes the points 1 and 2, and acts as a 
7-cycle on the others. (That no generality is lost here depends on the symmetry of 
the STS(7): all pairs of points are ‘alike’.) Finally, there is a unique power of 6 
which maps 3 to 4; so we may assume that @ itself does so. Thus, in cycle notation, 


6=(1)(2)(84abcde), 


where a,...,¢ are 5,... ,9 in some order; in other words, (5 ; 7 S °) is a permuta- 
tion in two-line notation. 
Thus our algorithm is as follows: 

o set up the system (X, B); 

s generate in turn all permutations (5 e7 2 e); for each, let @ = (1)(2)(3 4a bede), 
and check whether B, 86@,...,B6° are pairwise disjoint. Report success if so. 
Program this calculation. (You should find two permutations which give rise to 

a solution.) 
16. Here is a related problem. Cayley showed that it is impossible to partition the 
3) = 35 triples from a 7-set into five disjoint Steiner triple systems. In fact, no more 


3 
than two disjoint STS(7)s can be found, Verify this observation. 


3 For a proof of this fact, see Chapter 14. 


9. Finite geometry 


In Plane Geometry that afternoon, | got into an argument with Mr Shull, the 
teacher, about parallel lines. | say they have to meet. I'm beginning to think 
everything comes together somewhere. 


William Wharton, Birdy (1979) 


Topics: Finite fields; Gaussian coefficients; projective and affine 
geometries; projective planes 


TECHNIQUES: Linear algebra 
ALGORITHMS: 


CROSS-REFERENCES: Binomial coefficients (Chapter 3); Orthogonal 
Latin squares (Chapter 6); de Bruijn—Erdés Theorem (Chapter 7); 
Steiner triple systems (Chapter 8) 


Projective geometry over finite fields is a topic of great importance, for many reasons. 
It provides a large collection of highly symmetric structures, with interesting groups 
of collineations; it is a so-called ‘g-analog’ of the family of subsets of a set, providing 
on interesting perspective; and it ties in with almost everything else we have met so 
ar. 


9.1. Linear algebra over finite fields 
We already met in Section 4.7 the basic fact about the existence of finite fields: 


(9.1.1) Finite fields 
There exists a field with q elements if and only if ¢ is a prime power. 
If so, then the field is unique up to isomorphism. It is called the 
Galois field of order q, and denoted by GF(q). 


This fact is proved in any good algebra textbook. I have included an outline of 
the proof at the end of this chapter (Section 9.9). If you haven’t met it before, and 
have trouble with the algebra involved, you may take it on trust, and keep in mind 
the case when the order is prime. (The Galois field of prime order p is the field 
Z/(p) of integers modulo p.) 

In traditional linear algebra, it is usually assumed that the field over which we 
work is the field of real numbers (or possibly some variant, such as the rational or 
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complex numbers). However, almost everything works the same over finite fields. 
The definition of linearly independent set, spanning set, basis, subspace; the formula 


dim(U N W) + dim(U + W) = dim(V) + dim(W), 


the representation of linear maps by matrices, and the rank and nullity formula, all 
work as usual. 

Row operations and reduced echelon form also work in the same way; but, since 
we will need these, I will sketch them. The three types of row operation on a matrix 
are: 

e multiply a row by a non-zero scalar; 

e add a multiple of one row to another; 

e interchange two rows. 
These operations do not change the linear dependence or independence of the rows 
of the matrix, and also do not change the row space (the subspace spanned by the 
rows). 

A matrix A = (a;;) is said to be in reduced echelon form if the following three 
conditions hold: 

e given any row of A, either it is zero, or the first non-zero entry is a 1 (a so-called 

‘leading 1’); 

e for any ¢ > 1, if the i'* row is non-zero, then so is the (i — 1)", and its leading 1 
is to the left of the leading 1 in the i row 
e if a column contains the leading 1 of some row, then all its other entries are 0. 
Now the following result holds: 


(9.1.2) Proposition. Any matrix can be put into reduced echelon form by applying a 
series of elementary row operations; and the reduced echelon form is unique. 


If a matrix is in reduced echelon form, then its rows are linearly independent if 
and only if the last row is non-zero — this is the familiar test for linear independence 


of a set of vectors. 

Note that, for linear algebra, the weaker notion of ‘echelon form’ (where the 
third condition in the definition is deleted) suffices; but, for us, a crucial fact about 
reduced echelon form is its uniqueness, and this is not true for the weaker form. 


9.2. Gaussian coefficients 


We are now going to do some counting in vector spaces over finite fields. Let V(n, g) 
denote an n-dimensional vector space over GF (gq). First, the number of vectors: 


(9.2.1) Proposition. The number of vectors in V(n,q) is equal to g”. 


Proor. As usual, by choosing a basis, we represent the vectors by all n-tuples of 
elements of GF(q); and there are g” of these. 


The Gaussian coefficient k, is defined to be the number of k-dimensional 


subspaces of V (n, q). 


9.2. Gaussian coefficients 


(9.2.2) Gaussian coefficients 


E- De -1e -1) 
E-D. a 


Proor. First we show: 


The number of linearly independent k-tuples in V(n, gq) is equal to 


(a — 1)(g" — q). - (g" — 47’). 


This is proved by examining the number of choices of each vector. A k-tuple of 
vectors is linearly independent if and only if no vector lies in the subspace spanned 
by the preceding vectors. Thus, the first vector can be anything except zero (q” — 1 
choices); the second must lie outside the 1-dimensional subspace spanned by the 
first (q” — q choices); and, in general, the i" must lie outside the (i — 1)-dimensional 
subspace spanned by its predecessors (q” — g*—! choices). Multiplying these numbers 
gives the result. 

Now a k-dimensional subspace is spanned by k linearly independent vectors, 
and we have counted these. But a given subspace U will have many different bases. 
How many? Just the number of linearly independent k- tuples in a k-dimensional 
subspace, which is found from the same formula by putting k in place of n. We 
must divide by this number to obtain the number of subspaces. Cancelling powers 
of q gives the quoted formula. 


Now the number of k-dimensional subspaces of V (n, q) is equal to the number 
of k x n matrices over GF(q) which are in reduced echelon form and have no zero 
rows. This gives another way to calculate H . 

q 


EXAMPLE. Let n = 4 and k = 2. Our formula gives 


f _@=)@-)) 
2}, (a? -1)(@-]) 
=( +1 +g+1) = +E +2 tgl. 


We check by counting matrices. The possible shapes are 
*) 1 « Q * 
’ 001 
0 1 > 
00 0 1}? 


where * denotes an arbitrary element. So there are q+ ¢°+q47+97+4+4+ 1 matrices. 
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REMARK 1. If we regard the Gaussian coefficient as a function of the real variable ¢ 
(where n and k are fixed integers), then we find that 


lim 
qa 


For, by ’'Hépital’s rule, we have 


for a,b Æ 0; so 


_ [nr] _nin-1)...(n-k+1)_ fr 
lim tL k(k—1)...1 = (3). 


For this reason, the Gaussian coefficients are sometimes called the ‘g-analogs’ of the 
binomial coefficients. 


REMARK 2. The Gaussian coefficients can be given a combinatorial interpretation 
for all positive integer values of g greater than 1, not just prime powers. For 
let Q be any set of size q, containing two distinguished elements called 0 and 1. 
Then the definition of a matrix in reduced echelon form over Q makes sense, even 
though the algebraic interpretation is lost. The number of & x n matrices in reduced 
echelon form with no zero rows is given by a polynomial in q. But, for infinitely 
many values (all the prime powers), this polynomial coincides with the Gaussian 
coefficient (which is also a polynomial); so they are identically equal. 


The matrix interpretation enables us to give a recurrence relation for the Gaus- 


sian coefficients: 
Pry [eal rele, 
4 4 q 


ProoF. Consider k x (n + 1) matrices in reduced echelon form, with no zero rows. 
Divide them into two classes: those for which the leading 1 in the last row occurs 
in the last column; and the others. Those of the first type correspond to (k — 1) x n 
matrices in reduced echelon with no zero rows, since the last row and column are 
zero apart from the bottom-right entry. Those of the second type consist of a k x n 
matrix in reduced echelon with no zero rows, with a column containing arbitrary 
elements adjoined on the right. Since there are q% choices for this column, the 
recurrence relation follows. 


(9.2.3) Theorem. 


Note that this relation reduces to the binomial recurrence when q = 1. However, 
unlike the binomial recurrence, it is not ‘symmetric’. (For a symmetric form, see 
Exercise 3.) In fact, we have: 


(9.2.4) Proposition. 
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Proor. This follows from the bijection between k-dimensional subspaces of V = 
V(n,q) and (n — k) dimensional subspaces of its dual space V* (where a subspace 
of V corresponds to its annihilator in V~). 


Thus, we obtain another recurrence: 


n+1} jn atik] P 
Pol- Keth 
4 q q 
(see Exercise 4). 


From the formula for the Gaussian coefficients, we can deduce another result 
analogous to a binomial coefficient identity: 


k n a n—1 
-1 = -1 . 
(g fl (g feil 
q q 
In fact, quite a lot of the combinatorics of binomial coefficients can be extended to 
their g-analogs; but we have enough for our needs now. 


We can use the recurrence relation above to prove a pretty analogue of the 
Binomial Theorem (3.3.1): 


(9.2.5) g-binomial Theorem 
Forn = 1, 


n=l n 
TI. +9’) = So gh 9? Hl t. 
k=0 k 


i=0 g 


Proor. The proof is a straightforward induction. For n = 1, both sides are 1 + t. 
Suppose that the result is true for n. Then 


[ia + ot) = (3: gone ”) (+9), 
i=0 k=0 4 


The coefficient of ¢* on the right is 


k(k-1)/2 |” (k-1)(k—2)/2 
pema sanee], 


1 
—gktk-1)/2 | |” n-kti| % 
een] cor" | 


agit DPe n+l 
k 


` 
q 


as required. 
Letting q — 1, we obtain the usual Binomial Theorem, 


It’s now easy to count the non-singular matrices. 
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(9.2.6) Theorem. The number of non-singular n x n matrices over GF(g) is 
(g" —1)(@" —9)...(@@- 9"). 


Proor. A square matrix is non-singular if and only if its rows are linearly indepen- 
dent. We counted linearly independent k-tuples above. 


Note that the non-singular n x n matrices form a group, the so-called general 
linear group GL(n, q). The theorem above computes the order of this group. 


9.3. Projective geometry 


The definition of projective geometry seems strange at first meeting. We'll make a 
short detour to see where it came from. 

One of the goals of painting is to create a 2-dimensional picture whose effect on 
a viewer approximates that of the 3-dimensional scene it depicts. In the European 
renaissance, painters began to approach this problem mathematically. Let us idealise 
the situation, and assume that the painter’s eye is a point, and take this point to be 
the origin of a coordinate system for space. He sees an object by means of a ray of 
light from the object to his eye. Another object seen by a ray in the same direction 
will appear in the same place. (In practice, of course, the nearer object will hide 
the further one). Thus, the points of the painter’s perceptual space can be identified 
with semi-infinite rays through the origin. 


Fig. 9.1. Perspective 


The painter wants to represent his perceptual space in a plane. He sets up a 
‘picture plane’ II, not passing through his eye. A typical ray will meet TI in a single 
point, which can be taken to represent that ray (and hence, to represent objects 
for which that ray is the line of sight). Assuming that IJ is a mathematical plane, 
extending infinitely in all directions, then the rays represented are all those on one 
side of the plane Il’ through the painter’s eye parallel to II. 
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Mathematically, it is simpler to replace rays by lines through the origin, extending 
in both directions. (The painter doesn’t have eyes in the back of his head, and so 
he will not actually picture objects behind him.) With this convention, every line 
through the origin is represented by a unique point in the picture plane II, except 
for the lines in II! (that is, the lines parallel to II). This led to the convention of 
adjoining mathematical ‘ideal points’ to II to represent these lines, forming the real 
projective plane. 

Thus, the real projective plane can be regarded in either of two ways: the 
picture plane II with ‘ideal points’ added, or the set of all lines through the origin 
(1-dimensional subspaces) of 3-dimensional space R°. The second representation 
has the disadvantage that points of the plane ‘are’ lines rather than points, but the 
(more than compensating) advantage that all points are alike. 

What about lines? Given a line L of R?, not containing the origin, the set of 
lines joining its points to the origin sweep out a plane (minus one line, the ‘point 
at infinity’), which intersects II in a line. This is the line which the painter draws to 
represent L. In other words, in the second (3-space) model, a line of the projective 
plane is a 2-dimensional subspace of R*. Note that any two lines of the projective 
plane meet. For example, if L, L’ are lines in 3-space which are parallel but not 
in Il’, then their representations in II meet at the point where the line through the 
origin parallel to Z intersects II. 

This gives us the clue for the general definition, The n-dimensional projective 
space over a field F, denoted PG(n, F), is defined by means of an (n+1)-dimensional 
vector space V = V (n + 1, F). The points of projective space are the 1-dimensional 
subspaces of V; the lines are the 2-dimensional subspaces; planes are 3-dimensional 


subspaces; and so on. Note that a line, normally regarded as 1-dimensional, is 
represented by a 2-dimensional vector space. We saw the motivation for this 
already; but, in an attempt to reduce confusion, we use the term k-flat for the object 
in projective geometry represented by a (k + 1)-dimensional vector subspace. 

Now some familiar geometric properties hold. For example: 


(a) Two points lie in a unique line. 
(b) Two intersecting lines lie in a unique plane. 
These properties follow from elementary linear algebra. For (a), the two points are 
1-dimensional subspaces, and their span is 2-dimensional. For (b), the two lines are 
2-dimensional ‘subspaces U, and U2; the fact that they intersect in a point means 
that dim(U; N U2) = 1, and so dim(U; + U2) = 3, whence the two lines span a plane. 
Slightly less familiarly, the converse of (b) holds: 


(c) Two coplanar lines intersect. 


(This follows by reversing the argument, noting that dim(U; + U2) = 3 implies 
dim(U, N Uz) = 1.) In other words, there are no parallel lines! 


If F is a finite field GF(q), then we denote the projective space by PG(n, q). Now 
we can count objects in PG(n,q) in terms of Gaussian coefficients. For example: 


(9.8.1) Proposition. PG(n,q) has e, = (qg**! — 1}/(q — 1) points. It has pr a 


k-flats, each of which contains (g**'! — 1)/(q — 1) points. 
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In particular, the projective plane PG(2,q) has g? + q +1 points and g? +q +1 
lines; each line contains g + 1 points and each point lies in q + 1 lines; two points 
lie in a unique line, and two lines intersect in a unique point. Thus, it is an example 
of a family of sets satisfying the hypotheses and the final conclusion of the de 
Bruijn—Erdés Theorem (see Section 7.3). 


9.4. Axioms for projective geometry 


How do we recognise a projective space? Let us assume that we are given the points 
and the lines only. (In fact, all the flats can be recovered from these data: a set 
of points is a flat if and only if it contains the unique line through any two of its 
points. See Exercise 6.) Now, as just remarked, two points lie on a unique line. But 
this alone is not enough to force the structure to be a projective space. For example, 
any Steiner triple system (Chapter 8) has this property, if we take the lines to be the 
triples; and certainly not every Steiner triple system is a projective space PG(n, q). 
(Three points per line forces q = 2, so that the total number of points would be 
2+! _ 1, But there are Steiner triple systems where the number of points is not of 
this form.) 


In Section 8.5, we defined a class of Steiner systems which were referred to as projective. If you 
read that section, you will be reassured to know that those systems are precisely the projective spaces 
PG(n, 2). As defined there, the points are the non-zero vectors of V(n + 1, 2), and the lines are the 
triples of vectors with sum zero. But, over GF(2), a 1-dimensional space contains the sero vector 
and a unique non-zero vector, so there is a one-to-one correspondence between the non-zero vectors 
and the subspaces they span. Moreover, a 2-dimensional subspace contains the zero vector and three 
non-zero vectors; it is not hard to see that the sum of these three vectora is sero, and conversely that 
any three vectors with sum sero, together with the zero vector, form a 2-dimensional subspace. 


The correct characterisation was given by Veblen and Young, and can be stated 
as follows. 


(9.4.1) Veblen—Young Theorem. Let £ be a family of subsets (called lines) of the set 

X. Suppose that the following conditions hold: 

(a) every line contains at least three points; 

(b) two points of X lie in a unique line; 

(c) there exist two disjoint lines; 

(d) if a line meets two sides of a triangle, not at their intersection, then it meets the 
third side also. 

Then X and £ can be identified with the points and lines of the projective space 

PG(n,q) for some n > 3 and some prime power q. 


Fig. 9.2. Veblen—Young axiom 
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two points lie on a unique line. 

thet hi is slightly different from the definition we gave before, where this property 

was derived from the property that any two lines meet in a unique point; but the 

two definitions are equivalent.) l 

The only possible projective plane of order 1 is a tri 

assume that the order is greater than 1. The geometry PG 
for any prime power 9. 


We list some basic properties of projective planes. 
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(2, q) is a projective plane 


(9.5.1) Proposition. In a projective plane of order q, 
e any point lies on q +1 lines; 
e two lines meet in a unique point; 
e there are q? + q + 1 lines. 


Proof. Take a point p. There are gíg +1) points different from p; each line through 


contains q further points, and there are no overlaps between these lines (apart 
from p). So there must be q + 1 lines through p. Now let Lı and Lz e nos, nd pa 
point of Ey. Then the g +1 points of Lz are all joined to p by diferent i ines s e. 
there are only qg + 1 lines through p, they all meet La in a point; in p 3 


meets Ly. Finally, counting pairs (p, L) with p € L, we obtain 
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IBl-(¢+D=(Pt+et))-(¢t)), 


so |B| =g?tqtl. 


This shows that there is a ‘duality principle’ for projective planes. Let (X, B) be 
a projective plane. Let X’ = B and B' = {bz : £ E X}, where 


b-={LEB:zEL} 


then (X', B’) is also a projective plane of order q. Its points and lines correspond to 
the lines and points of the original plane. 


For which numbers q do projective planes of order q exist? We have seen that 
they exist for all prime powers. The main non-existence theorem is the celebrated 
Bruck-Ryser Theorem: 


(9.5.2) Bruck-Ryser Theorem 
If a projective plane of order n exists, where n = 1 or 2 (mod 4), 
then n is the sum of two squares of integers. 


The proof is given in Section 9.8. The theorem shows, for example, that there 
is no projective plane of order 6, a fact connected with Euler’s officers, as we will 
see. However, since 10 = 1? + 3°, the question of whether or not a projective 
plane of order 10 exists is not resolved by our results so far. This question was 
finally settled in the negative by Lam, Swiercz and Thiel in 1989, after several large 
computations taking a number of years. The existence question for a plane of order 
12 is unresolved at present. 

How do we recognise the special planes PG(2,q)? It turns out that they are 
precisely the (finite) projective planes in which the classical theorems of Desargues! 
and Pappus? are valid. 


(9.5.3) Desargues’ Theorem for II 
Let a,b,c, and azbzcz be triangles in the projective plane II such 
that the lines a 02, 61, b2 and cyc2 are concurrent. Let p = bie MN bec, 
q = C101 N Coa, and r = aibi N abı. Then p,q,r are collinear. 


o o mamos 

1 Desargues was a contemporary of Descartes; their advocacy of geometric and algebraic methods 
respectively created a rivalry between them. 

2 Pappus was one of the last of the classical Greek geometers. His work, the Collection, was important 
in the preservation of their heritage. 
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ce 


Fig. 9.3. Desargues’ Theorem 


(9.5.4) Pappus’ Theorem for II 
Let a, b,c, d,e, f be points of the projective plane I, such that a, c, e 
are collinear and 6,d, f are collinear. Let p = ab N de, q = bc Nef, 
r=ed( fa. Then p,q, are collinear. 


Fig. 9.4. Pappus’ Theorem 


0.5.5) Theorem. The following conditions are equivalent for a finite projective plane 
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e II is isomorphic to PG(2, gq) for some prime power q; 
e Desargues’ Theorem holds in Il; 
e Pappus’ Theorem holds in II. 


We now develop a connection with the theory of Latin squares. First, we define 
a related geometric structure. An affine plane of order gq consists of a set X of ¢ 
points, and a set B of g-element subsets of X called lines, such that two points lie 
on a unique line. The Steiner triple system on 9 points is an example of an affine 
plane. 

Two distinct lines of an affine plane clearly have at most one common point. 
Unlike a projective plane, lines may be disjoint. We call two lines parallel if they are 
either equal or disjoint. 


(9.5.6) Proposition. In an affine plane of order q, 
(i) any point lies on q + 1 lines; 
(ii) there are q(q + 1) lines altogether; 
(iii) (Buclid’s parallel postulate) if p is a point and L a line, there is a unique line L! 
through p parallel to L; 
(iv) parallelism is an equivalence relation; each parallel class contains q lines which 
partition the point set. 


Proor. We begin as before. If p is a point, the q? — 1 points different from p have the 
property that each lies on a unique line through p, and each line through p contains 
q — 1 further points; so there are (q? — 1)/(q— 1) = q +1 lines through p. Now 
double counting shows that there are g? - (q + 1)/¢ = 4(q + 1) lines altogether. 

Let p be a point and Z a line. If p € L, then clearly L is the unique line through 
p parallel to itself, since any two such lines intersect in p. Suppose that p ¢ L. Then 
p lies on q + 1 lines, of which g join it to the points of L; so exactly one is disjoint 
from L. 

The relation of parallelism is, by its definition, reflexive and symmetric, and we 
have to show that it is transitive. In other words, two lines L, L’ parallel to the same 
line L’ are parallel to one another. This is clear if two of the three lines are equal, 
so suppose not. If L and Z’ have a point p in common, then they both pass through 
p and are disjoint from LE’, which is impossible. So L and L' are disjoint. 

Clearly each parallel class contains exactly one line through any point. Thus, the 
q + 1 lines through a point p contain representatives of all the parallel classes. To 
see the same thing another way, observe that each parallel class contains F/¢=4 
lines, since these lines are pairwise disjoint and cover the point set; so there are 
g(q+1)/¢=4 +1 parallel classes. 


(9.5.7) Theorem. A projective plane of order q exists if and only if an affine plane 
of order q exists. 


PRoor. We have to construct each type of plane from the other. Suppose that (X, B) 
is a projective plane. Let L be a line, and set Xo = X \ L and 


Bo = {L'\ L: L €B, +L}. 
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(In other words, we remove a line and all of its points.) There are (q?+¢+1)—(¢4+1) = 
q? points in Xo; each line has (g + 1) — 1 = q points, since any line meets Lina 
unique point; and two points lie in a unique line. So (Xo, Bo) is an affine plane. 

Conversely, suppose that (Xo, Bo) is an affine plane. Let Y be the set of 
parallel classes of lines in this plane. We take the point set X to be Xo UY; then 
|X| = q?+q+ 1. There are two types of new lines. For each line L € Bo, set 
L* = LU {C}, where C is the parallel class containing Z; also take Y as a new line. 
Thus the new structure is (X, B), where 

B={L': Le B}U{Y}. 

Any line has q + 1 points, since one new point is added to each old line L, and 
there are q+ 1 parallel classes. We have to show that two points of X lie in a unique 
line. There are several cases: 

e two points x,y in Xo lie in a unique old line, hence a unique new line of the 
first kind; 

e given a point z € Xo and a parallel class C € Y, there is a unique line containing 
z in the parallel class C, hence a unique new line of the first kind containing 
both; 

e two parallel classes lie in a unique new line of the second type, namely Y. 

So (X, B) is a projective plane. 

The process used above to extend an affine plane to a projective plane is called 
‘adding a line at infinity’. The line Y is the line at infinity, and its points are the points 
at infinity, the points where parallel lines of the affine plane meet. This is exactly the 
procedure which turns the Euclidean ‘picture plane’ into the real projective plane. 


We now make the connection with orthogonal Latin squares, and exhibit affine 
planes as the solution of a different kind of extremal problem. Recall the definition 
of a Latin square of order n (from Chapter 6): it is an n x n matrix with entries 
1,2,...,n, having the property that each entry occurs exactly once in each row or 
column. Also, two Latin squares A = (aj) and B = (bj) are orthogonal if, for 
any pair (k,1) of elements from {1,...,}, there are unique values of 7 and j such 
that a; = k, bj; =L A set {Ai,...,A,} of Latin squares is called a set of mutually 
orthogonal Latin squares (MOLS) if any two squares in the set are orthogonal. We 
saw that there cannot be more than n — 1 MOLS of order n. 


(9.5.8) Theorem. There exist n — 1 MOLS of order n if and only if there is an affine 
plane of order n. 


Proor. Given a set {A;,...,A,} of MOLS, we build a geometry of points and lines 
resembling a ‘partial affine plane’. We take the points to be the cells ofann xn 
array: 
X = {(i,j):t,7 =1,..-, 7}. 

There are three types of lines: 
(a) horizontal lines, of the form {(z,j):2=1,... ,n}, where j is fixed (j = 1,...,7); 
(b) vertical lines, of the form {(i,y): y= 1,... yn}, where i is fixed (i = 1,...,7); 
(c) for each square Am (m = 1,...,7), and for each entry k (k =1,...,7), the set 

{4,9}: (Am) = k} 
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Clearly there are n? points, and any line contains n points. 


We claim that two points lie on at most one line. This is clear for horizontal or 
vertical lines; and the definition of a Latin square guarantees that two points of a 
type (c) line lie in different rows and columns. Furthermore, lines of type (c) coming 
from the same square Am are disjoint. So consider two lines of type (c), defined by 
square Am, and entry kı and by square An, and entry kz respectively. Could they 
have two points (4, j1) and (iz, j2) in common? If so, then in both these positions 
the square Am, has entry kı and Am, has entry ka, contradicting orthogonality. 


Now any point p lies on r + 2 lines: one horizontal, one vertical, and one for 
each of the squares. These lines contain (r + 2)(n — 1) points other than p. So 
14(r+2)(n—1) < n?, whence r < n — 1, giving another proof (more-or-less the 
same as the earlier one) of the upper bound. Equality holds if and only if any two 
points lie on a line, that is, the geometry is an affine plane. 


Conversely, suppose that an affine plane of order n occurs. It has n? points 
and n + 1 parallel classes of lines. We select two parallel classes {H,,...,H,} and 
{V,,-.-,Va} of lines (to be the horizontal and vertical lines). Now any point lies on 
a unique horizontal line H; and a unique vertical line V;; we can give this point the 
coordinates (7, j). 

Now let {£1,..., En} be any further parallel class, and define a matrix A by the 
rule that A; = k if and only if (2,7) € Lp. It is easily checked that this matrix is a 
Latin square. Furthermore, the matrices obtained from different parallel classes are 
orthogonal. So we obtain a set of n — 1 MOLS from our affine plane. 


REMARK. Given any set of r MOLS of order n, a ‘geometry’ can be constructed 
as in the above proof. It has n? points and n(r + 2) lines, with each line having z 
points, two points in at most one line, and the lines falling into r + 2 parallel classes. 
Such a geometry is called a net. 


9.6. Other kinds of geometry 


Finite geometers have produced a bewildering variety of new types of geometries, 
usually defined by lists of axioms: affine spaces, polar spaces (and affine polar 
spaces), partial and semi-partial geometries, generalised polygons, near-polygons, 
buildings, etc. In this section, I will say a little about two of these types, which are 
closely related to projective spaces. 


We have already seen the relation between projective and affine planes. Not 
surprisingly, the same can be done in any dimension. We define the n-dimensional 
affine geometry AG(n,q) over the field GF(q) to be obtained from the projective 
geometry PG(n, q) by designating a hyperplane H (a subspace of codimension 1) as 
being ‘at infinity’ and deleting it, together with all the subspaces it contains. 

Just as in the plane case, there is a cartesian representation. If the underlying 
vector space V(n + 1,q) consists of vectors with coordinates (71,...,2n41), We can 
take the hyperplane at infinity to have equation n+: = 0; then any non-infinite 
point has a unique representative with r,41 = l, say (z1,...,%n,1), and we can 
represent it uniquely by the n-tuple (21,..-,2a). We can regard this as a vector of 
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V(n,q). Now the whole geometry can be represented in V = V(n,q), as follows: 
k-flats turn out to be all cosets W = v of k-dimensional vector subspaces W of 
V. (This works even for points: the only 0-dimensional subspace is {0}, and its 
cosets are all the singleton sets {v}, which can be identified with individual vectors 
v € V.) Now it is clear that a flat of dimension k contains gë points. The number 


of such flats is g?~*|"| : for there are [7] choices of the vector subspace W, and 
k q 


k 

4 
g” choices of the coset representative v, but g* of these give rise to the same coset. 
Summarising: 


(9.6.1) Proposition. AG(n, g) has g” points. It has qr ll ; flats of dimension k, each 


of which contains q} points. 


There are theorems about recognition of affine spaces, like the Veblen-Young 
Theorem but more complicated. We won't pursue this any further (but see the 
discussion of affine Steiner triple systems in Section 8.5). 


Now we examine briefly a class of geometries which axiomatise (among other 
things) the nets, which arose in connection with orthogonal Latin squares and affine 
planes in Section 9.5. 


Let s,t,a@ be positive integers. A partial geometry with parameters s,t,a is a 
geometry of points and lines for which the following axioms hold: 
e every line is incident with s + 1 points, and any point with ¢ + 1 lines; 
© two points are incident with at most one line (and two lines with at most one 
point — but this is equivalent to the preceding!); 
if the point p is not incident with the line L, then there are exactly a points of L 
collinear with p (or, equivalently, exactly œ lines through p concurrent with £). 
The comments in parentheses demonstrate that the dual of a partial geometry with 
parameters s,t, a is a partial geometry with parameters t, s, a. (The dual is defined in 
the same way as for projective planes in Section 9.5.) Note that 1 < œ < min(s,t)+1. 
Part of the motivation for studying partial geometries is that they include many 
other types of structure as special cases. Let us just notice two cases. 


A partial geometry with a = s + 1 has the property that any two points lie on a 
unique line. (For let p and g be points, and L a line containing g. If L also contains 
p, we're done; else, by the third axiom and the fact that a = |L|, every point of L 
(and in particular q) is collinear with p. Conversely, a structure in which two points 
lie on a unique line and every line has a constant number of points, is a partial 
geometry with a = s + 1. These include projective and affine planes, projective and 
affine spaces of arbitrary dimension (where lines are 1-flats), Steiner triple systems, 
and complete graphs (with two points per line). 


A net (obtained from a family of r MOLS, as in Section 9.5) is a partial geometry 
with s=n—1,t=r—1,a=r-—1. (The parameters s and t are clear. Now, if p 
is not on the line L, then every line through p meets L except for the unique line 
parallel to L. 

Conversely, let G be a partial geometry with a = t. Let n =s +l andr=t+1. 
Calling two lines parallel if they are equal or disjoint, we see that, given any point 
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p and line L, there is a unique line L’ through p parallel to L. Hence parallelism 
is an equivalence relation, and each parallel class covers all the points of G once. 
Now every line has n points. It follows that every parallel class has n lines (since a 
line not in that parallel class meets each line in the class once), and so there are n? 
points altogether. Thus G is a net. 

We conclude that nets are the same as partial geometries with a = ¢t. In 
particular, a = t = 1 defines a square grid. 


A very important kind of partial geometry consists of generalised quadrangles, 
defined by the condition that a = 1. We see that square grids are generalised 
quadrangles; but there are many others, Exercise 12 gives a simple construction of 
one. 


9.7. Project: Coordinates and configurations 


As you might expect, the projective planes PG(2,q) have many special properties 
not shared by arbitrary planes. The proofs of these properties must involve the 
algebraic structure: in other words, we work with coordinates rather than with the 
geometric configurations they represent. In this section, we will see how to set up 
coordinates, and then use them to prove one of the most famous theorems of finite 
geometry, Segre’s Theorem. 


Let F = GF(q). The points of PG(2,q) are 1-dimensional subspaces of the 
vector space V = V(3, F). Each point is spanned by a non-zero vector (z, y, z); but, 
of course, any non-zero multiple (cz, cy, cz) would span the same point. We use the 
notation [z, y, z] for the point spanned by (z,y,z), so that [x,y,z] = [ca, cy, cz] for 
any c Æ 0. Then z, y, z are called homogeneous coordinates for the point. 

(An alternative procedure would be to call two non-zero vectors equivalent if 
one is a constant multiple of the other, and then define points to be equivalence 
classes of vectors.) 

Any line can be represented by a linear equation az + by + cz = 0, where a, b,c 
are not all zero. We see that multiplying a,b,c by a constant doesn't change the 
set of points on the line; so we can also represent lines by equivalence classes 
(or 1-dimensional subspaces) [a,b,c]. (In algebraic terms, lines, or 2-dimensional 
subspaces of V, are represented by 1-dimensional subspaces of the dual space V*.) 

We can find unique representatives of the points and lines at the cost of 
distinguishing cases. For this purpose, we take the line z = 0 (represented by 
[0,0,1]) to be the line at infinity. Now any point not on this line (ie, in the 
affine plane) has z # 0, and so has a unique representative [z,y,1] (obtained by 
multiplying through by the inverse of the third coordinate): this corresponds to the 
usual Cartesian coordinates (z, y} in the affine plane. There are g? points of this 
form. Similarly, points on the line z = 0 either have z Æ 0 (in which case there is a 
unique representative [1, m, 0]), or have z = 0 as well (there is a unique such point, 
namely [0,1,0]). This gives the q + 1 points on the line at infinity, making q? +q +1 
lines altogether. 

Now we consider the lines. One of them is the line at infinity, [0,0,1]. For 
most other lines, as usual in coordinate geometry, we can take the equation to be 
y = mæ +c: this line has slope m and y-intercept c in the standard way. (Its 
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affine points are those {z,y,1] for which x and y satisfy this equation.) In terms of 
homogeneous coordinates, the equation is y = mz + cz, or [m,—1,c]; it contains the 
point [1, m, 0] of the line at infinity. The remaining lines (those with ‘infinite slope’) 
have equation z = ¢, which in homogeneous coordinates is x = cz or [—1,0,c]; they 
pass through the point [0, 1, 0]. 


We are going to find all the ovals in the planes PG(2,q) with q odd. First we have to define 
ovals, and prove a few of their properties. 


An oval in a projective plane is a set O of points with the properties that no three of its points 
are collinear, and it has a unique tangent at each of its points (a line meeting it in no further 
point). It's clear that this definition is abstracted from the intuitive notion of an oval in the real 
plane (exemplified by any smooth convex curve); but intuition doesn’t always serve us well in finite 
geometry. 

Given an oval ©, any line of the plane meets © in at most two points; we call a line L a secant, 
tangent or passant according as |EN O| = 2, 1 or 0. If p is a point of Ø, then p lies on q + 1 lines 
(where g is the order of the plane), of which one is a tangent and the other q are secants, each 
containing one further point of O; so |O] = g+ 1. 

In PG(2, g), there is an important special class of ovals, called conics. A conic C is the set of 
points satisfying a non-singular quadratic equation: thus 


C= {[z,y, z) : az? + by? + cz? + fyz + gzz + hey = 0}, 


where the quadratic form is non-singular (this means that it cannot be transformed into a form in 
less than three variables by any non-singular linear substitution of the variables z, y, z). Note that, 
because every term in the quadratic form has degree 2, if (x,y,z) satisfies the equation, so does 
(cx, cy, cz); so our definition does make sense. 

Any conic is an oval, To see this, take a line £, which (by choice of coordinates, ie, a 
linear substitution) we can assume is the line z = 0. The points of C N L are those [x,y,0] for 
which ax? + by? + hzy = 0. Now we cannot have a = b = A = 0; for then the quadratic would be 
z(g£ + fy+ ez) =0, and a linear substitution would change it to zz = 0, involving only two variables. 
If a Æ 0, then the point [1,0,0] doesn’t satisfy the equation; any other point has a representative 
[z, 1,0], and lies on the conic if and only if az? + he +b = 0, and thie quadratic equation has at 
most two solutions. The argument is similar if b # 0. Finally, if a = 6 = 0, the equation is hry = 0, 
and there are two points which satisfy it, namely [1,0,0] and [0, 1, 0}. 


In the affine plane, there are three familiar types of conic: the ellipse, parabola, and hyperbola. 
But the three are equivalent in the projective plane. If we take a conic C in PG(2, q), and choose 
a line L to be the line at infinity, then the conic becomes a hyperbola, parabola or ellipse in the 
usual fashion if L is a secant, tangent or passant respectively. For example, consider the conic with 
equation zy = 2”. If we choose z = 0 to be the line at infinity, the affine form of the equation is 
zy = 1, a hyperbola (put z = 1); if y = 0 is the line at infinity, the affine form is z = z?, a parabola. 


(9.7.1) Segre’s Theorem. If g is an odd prime power, then any oval in PG(2,q) is a conic. 


Proor. Let © be an oval. We begin with some combinatorial analysis which applies in any plane of 
odd order; then we introduce coordinates. 


Step 1. Any point not on © lies on 0 or 2 tangents. 

Let p be a point not on Ø. Since [O| = ¢ + 1 is even, and an even number of points lie on 
secants through p, an even number must lie on tangents also. Let z; be the number of points outside 
O which lie on i tangents. Now we have 

Soa =a, 


Do izi= (+ De, 
J iG- Yai = (q+ De. 
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(These are all obtained by double counting. The first holds because there are q? points outside O; 
the second because there are qg + 1 tangents (one at each point of O), each containing g points not 
on O; and the third because any two tangents intersect at a unique point outside O.) 

From these equations, we see that J` #(¢— 2)a, = 0. But the term 7 = 1 in the sum vanishes (any 
point lies on an even number of tangents); the terms 7 = 0 and i = 2 clearly vanish, and i(i — 2) > 0 
for any other value of i. So a: = 0 for all i Æ 0 or 2, proving the assertion. 


Remark. Points net on © are called exterior points or interior points according as they lie on 2 or 0 
tangents, by analogy with the real case. But the analogy goes no further. In the real case, every line 
through an interior point is a secant; this is false for finite planes. (Can you count the number of 
secants through a point of each type?) 
Srep 2. The product of all the non-zero elements of GF(¢) is equal to —1. 

The solutions of the quadratic 2? = 1 are z = | and z = —1; these are the only elements equal 
to their multiplicative inverses. So, in the product of all the non-zero elements, everything except 1 
and —1 pairs off with its inverse, leaving these two elements unpaired. 


For the next two steps, note that we can choose the coordinate system so that the sides of a 
given triangle have equations £ = 0, y = 0 and z = 0 (and the opposite vertices are (1, 0,0}, [0, 1,0], 
and (0,9, 1] respectively). We'll call this the triangle of reference. 

Strep 3. Suppose that concurrent lines through the vertices of the triangle of reference meet the 
opposite sides in the points [0, 1, a], (b, 0, i], and [1,¢, 0]. Then abe = 1. 

(The equations of the concurrent lines are z = ay, x = bz and y = ez respectively; the point of 

concurrency must satisfy all three equations, whence abe = 1) 


REMARK. This result is equivalent to the classical Theorem of Menelaus. 


grep 4. Let the vertices of the triangle of reference be chosen to be three points of O, and let the 
tangents at these points have equations z = ay, 2 = bz and y = ex respectively. Then abe = —1. 

Proof: There are q — 2 further points of O, say p1,-.-:Py—2- Consider the point [1,0,0]. It 
lies on the tangent z = ay, meeting the opposite side in [0,1,a]; two secants which are sides of 
the triangle; and q — 2 further secants, through pi, ---,pg-2 Let the secant through p; meet the 
opposite side in [0, 1,a;]. Then a mz? a; = —1, by Step 2. If bi, c; are similarly defined, we have also 
b [1252 b: = e JI? ci = —1. Thus 


4-2 
abc [1 (este) =-1. 
isl 


But, by Step 3, aibe: = 1 for ¢=1,...,¢— 2; so abe = —1. 
Srep 5, Given any three points p,q,” of O, there is a conic C passing through p, g,” and having the 


same tangents at these points as does O. 
Proof: Choosing coordinates as in Step 4, the conic with equation 


yz — ezz + cary = 0 


can be checked to have the required property. (For example, [1, 0,0] lies on this conic; and, putting 
z = ay, we obtain ay° = 0, so [1,0, 0] is the unique point of the conic on this line.) 


grep 6. Now we are finished if we can show that the conic C of Step 5 passes through an arbitrary 
further point s of O. 

Let C’ and C be the conics passing through p,q, § and p,r, s respectively and having the correct 
tangents there. Let the conics C, C’ and C' have equations f = 0, ff =0, f" =0 respectively. (These 
equations are determined up to a constant factor.) Let Lp, Lg, Lr, Ls be the tangents to O at p,¢,7, $ 
respectively. Since all three conics are tangent to Lp at p, we can choose the normalisation so that 
f, f!, f" agree identically on Lp- 

Now consider the restrictions of f’ and f’ to La. Both are quadratic functions having a double 
zero at s, and the values at the point L, N Lp coincide; so the two functions agree identically on Ls. 
Similarly, f and f' agree on Lq, and f and f' agree on Lp. But then f, fi and f' all agree at the 
point Le N Le. So the quadratic functions f' and f! agree on Lp, Ls, and Le N L,, which forces them 
to be equal. So the three conics coincide, and our claim is proved (and with it Segre’s Theorem). 


9.8. Project: Proof of the Bruck-Ryser Theorem 


9.8. Project: Proof of the Bruck—Ryser Theorem 
In this section, we prove the Bruck-Ryser Theorem: 


if n= 1 or2 (mod 4) and a projective plane of order n exists, then 
n is a sum of two squares of integers. 


The proof uses a fair amount of number theory. It also has a very ad hoc appearance; 
you may wonder how anybody ever thought of it! In fact, there are deeper and 
more general number-theoretic regions lying hidden here, for relating integer zeros 
of quadratic forms to zeros modulo primes, going by the name of Hasse-Minkowski 
theory, which have important applications in combinatorics. The argument here can 
be regarded as the general argument translated into a simpler form in the special 
case. 


We need four ‘facts’ from number theory, Proofs and discussions of these will be given after the 
proof of the Bruck-Ryser Theorem. 


Fact 1. The ‘four-squares identity’: 


2 
(a? 4 a? + 02 + ab)(1? + 22 + 23+ a3) = y Hy H + yA, 


yı = 4121 — 4282 — Agha — 4424, 

Yo = 41 %2+ 021 + 324 — GaXa, 

Ys = 4123 + 43%) + 1472 — 4274, 

Ya = 4124+ G4r) + a283 — ag%o. 
Fact 2. If p is an odd prime, and there exist integers z1, 12, not both divisible by p, such that 
z? 422 =0 (mod p), then p is the sum of two integer squares. The analogous result holds for four 
squares. 
Fact 3. Every positive integer is the sum of four integer squares. 


Fact 4. For any integer n, if the equation 2? + y? = nz? has an integer solution with z, y, z not all 
zero, then n is the sum of two integer squares (that is, the equation has a solution with z = 1). 


Proor or BRUCK-RYSER THEOREM. Suppose that there is a projective plane of order n, where n = 1 
or 2 (mod 4). The number of points of the plane is N = n?+n+1; and we see that N =3 (mod 4). 
Let A be an incidence matriz of the plane, an N x N matrix with rows indexed by points and 
columns by lines, with (i,j) entry equal to 1 if the i" point is on the jt line, 0 otherwise. Then 
AAT has (i, j) entry equal to the number of lines containing the 6'” and j? points, which isn + 1 
if i = j, and 1 otherwise; that is, 
AAT =ni +d, 


where J is the matrix with every entry 1. 
Let z1,-..,2n be indeterminates, and let z = (z1,... gn). Leb zA =z = (z1,...,2); then 
21,...,2y are linear combinations of 21,...,2N with integer coefficients. We have 
zz’ =rAA' rl =nee™ +2J2", 
that is, 
z234... tz =n(s? +... +r) tw, 


where w = zı +...+ zy. We take a new indeterminate zy4, and add NEN to both sides of the 
above equation, Note that V + 1 is divisible by 4. Write n = af + a3 + a2 + a2 (by Fact 4), and use 
the four-squares identity (Fact 1) to write i 


2 
a(zaigi +-+ thiqa) = View +..-+ Viiga 
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where the y's are linear combinations of the z's. We have 
a+. 2h trey Sut HYN tw. 

In the next step, we make a number of specialisations, each expressing some +; as a rational 
linear combination of other z's. Note that the quadratic is positive definite, so, no matter how we do 
this, the resulting form will involve all the variables. To begin with, z; is involved in at least one y 
and at least one z; without loss of generality, it is involved in y, and z1. If it has different coefficients 
in these two expressions, we impose the condition y, = 21; otherwise, we impose 7, = —2). In either 
case, we can express x, in terms of the other 2x’s; and also z? = y?, so this term can be cancelled. 
Now repeat this process to cancel the terms y? and 2? for i = 2,..., N, obtaining finally 

naha = Ng HW, 
where yy41 and w are rational linear combinations (that is, rational multiples) of 241. So we can 
choose an integer value of 2y+1 such that yy ii and w are also integers, and we have a non-zero 
solution of the above equation in integers. By Fact 4, n is a sum of two integer squares. The theorem 
is proved. 

We now return to the proofs of the four ‘facts’. 

Proor oF Fact 1. Straightforward calculation. But the result has a deeper significance. The 
quaternions are a number system H extending the complex numbers. They have the form 
a = a; + G2i + aaj + 44k, 
where i? = j? = k? = —1, ijk = —1 from which it follows that ij = k, ji = —k, jk = i, kj = —i, 
ki = j, ik = —j. It is easily checked that 
(a1 + azi + aj + aak)(z1 + Zoi + zaj + tak) = y1 + Yoit yaj + yak, 
where 41,..., y4 are as in Fact 1. There is a ‘norm’ defined on the quaternions by 
lax + azi + azj + aakl| = af + a3 + a3 + 24; 
the four-squares identity says that 
llall - [lel = Ilaz]. 

If we treat the complex numbers similarly, using the norm ||a|| = |a|?, we obtain a ‘two-squares 

identity’ 

(a? + a3)(a7 + 23) = (azı — a2)" + (a122 + a2a)*. 
There is also an ‘eight-squares identity’, related to a number syatem called the octonions or Cayley 
numbers, 


Proor or Fact 2. We are given that rp = z? + 23, for some positive r; take an expression of this 
form in which r is as small as possible. We have to prove that r = 1. So suppose not. Choose ui, uz 
such that u; = 21 (mod r), ug = —z2 (mod r), and |u;| < 7/2 for i = 1,2. Then 


u? + us = 2? +22 =0 (mod r), 
say u? + u3 = rs, Then s < r, because of the bounds on w and uz. We have 
r? gp = (2? + e})(uy + u2) = (ziu — wove)? + (z102 + 2201)? 
by the two-squares identity. We have 
ziu, — Lot, = 27 4+23 =0 (mod r} 
and 
Liua + fou, = £12 — 2281 =O (mod r), 
so the equation has a factor r?, and we obtain 
sp =y +yz 
for y, = (ait) — Tzuz)/r, Y2 = (2142 + £241)/r. But this contradicts our choice of r, since 6 < r. 
The argument for four squares is very similar. 


3 These formulae were discovered by Hamilton, while walking by a canal in Dublin. He was so 
pleased with his discovery that he wrote it on a bridge he passed. 
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Poor oF Fact 3. According to the four-squares identity, if two numbers are sums of four squares, 
then so is their product. So it will suffice to show that every prime is the sum of four squares. Clearly 
2= 1? +1? +0? + 0°, so we need only deal with odd primes. 

We need another fact. Let p be an odd prime. A non-zero congruence class m mod p is called 
a quadratic residue (QR) if the congruence m = z? is solvable, and a quadratic non-residue (QNR) 
otherwise. Now, of the p — 1 congruence classes, half are QRs and half are QNRs, and the product 
of two QNRs is a QR. (See Exercise 12.) 

Now we separate two cases. 
Case 1: —1 is a QR. In other words, the congruence z? +1 = 0 (mod p) has a solution. By Fact 2, 
p is a sum of two squares. 
Case 2: —1 is a QNR. Let m be the smallest positive QNR. Then —m and m — 1 are QRs, and sọ 
the congruences <? =m—1 (mod p), y? =—m (mod p) are solvable. But then 


2? +4? +1? =0 (mod p), 
and by Fact 2, p is a sum of four squares. 


Proor or Fact 4. First, we argue that it suffices to prove the result for squarefree numbers n. For 
suppose it is true for squarefree n, and let n = mu? with m squarefree; let 2? + y? = nz?, where 
z, y,z are not all zero. Then z? + y? = m(uz)?. By assumption, m is a sum of two squares, say 
m =a? +67; and then n = (au)? + (bu). 

So let n be squarefree, say n = pi... Pe, Where pi,...,P, are distinct primes; and suppose that 
2? + y? = nz?, where x,y,z are not all zero. We may suppose that x, y, z have no common factor. 
Then z and y are not both divisible by p:; for if they were, then p? divides nz?, contradicting the 
facts that p? doesn’t divide n and that p; doesn’t divide z. Now by Fact 2, p; is a sum of two squares. 
This holds for all 7. By applying the two-squares identity k — 1 times, we see that n is a sum of two 
squares, as required. 


9.9. Appendix: Finite fields 


This section gives an algebraic proof of the basic existence result (due to Galois) for 
finite fields, cited in the first section of this chapter. The details may be somewhat 
sketchy, but a standard algebra textbook will fill them in for you. 


The proof requires a technical result, the uniqueness of splitting field. First, a definition. Let F 
be a field. We call a field containing F an eatension of F. Let £1, Ez be two extensions of F. We say 
that E and E are F-isomorphic if there is an isomorphism from E, to Ez which fixes every element 
of F. 


Step 1. Let F be a field, f(x) an irreducible polynomial over F. Then there exists an extension E of 
F such that f(x) has a root in E. Any two such fields which are minimal with respect to inclusion 
are F'-isomorphic. 


An example of such a field is the quotient ring F[x]/(f(x)), where F[2] is the polynomial ring 
over F and (f(x)) the ideal generated by f(x). (Since f is irreducible, the ideal it generates is 
maximal, and the quotient is a field.) Now, if E, and E are minimal extensions of F containing 
roots m and a2 of f(z) respectively, then every element of E; is expressible as a polynomial g(a;) in 
a; with coefficients in F, two polynomials representing the same element if and only if their difference 
is divisible by f; and the map which takes g(o1) to g(a) is an F-isomorphism from Æ, to Ez. 

The unique minimal extension of F containing a is denoted by F(a). 

It follows by an easy induction that, if f(a) is any polynomial over F', then there is an extension 
E of F such that f has all its roots in Æ; that is, f can be factorised into linear factors over E. (Just 
adjoin roots of f(z) one at a time.) A minimal such extension is called a splitting field of f(x) over 
F. 

The degree of a field extension E of F is its dimension as a vector space over F (when we forget 
multiplication in Æ and remember only how to add elements of Z or multiply them by elements of 
F). 
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Step 2. Any two splitting fields of f(x) over F are F-isomorphic. 


This is proved by induction on the degree of one of the splitting fields. If the degree is 1, so 
that f(z) already splits in F, the result is clear. So suppose not. Let E, and Ez be splitting fields 
of f(z) over F. Let œ, be a root of f(z) in E, but not in F, and œz a root of the same irreducible 
factor of f(z) in Ey. Then there is an F-isomorphism from F(a) to F(a) carrying a1 to a, by 
Fact 1; so we may suppose that a; = az. Now E, and Ez are splitting fields for f(z) over F(o1), 
with smaller degree than they have over F; by induction, they are #(a1)-isomorphic (and, a fortiori, 
F-isomorphic). 


Now we turn our attention to finite fields. 
Step 3. Let F be a finite field, There exists a prime number p such that p -a = 0 for alla € F, where 
p:a=at+eat+...+a p terms. 


The additive group of F is finite, so its elements have finite order. Suppose that the element 1 
has order p; that is, p: 1 = 0. Then p is prime; for if p= mn with m,n > 1, then (m-1)(a-1)=0, 
but neither m - 1 nor n -1 is zero (since, by definition, p is the smallest integer k for which &-1 = 0). 
But this contradicts the fact that F has no divisors of zero. 


The prime p is called the characteristic of F. 
Step 4. The number of elements in a finite field F is a power of the characteristic of F. 


This follows from (9.2.1), once we check that F is a vector space over Z/(p). (In fact, F is an 
extension of Z/(p), where Z/(p) consists of the elements 0,1,...,(p—1)-1 of F.) 


Step 5. If F has q elements, then F is a splitting field of the polynomial z3 — z over Z/(p), where p 
is the characteristic of F, 


For the multiplicative group of F has order q — 1, so all non-zero elements satisfy x7-! = 1, 
whence also x? = x; this polynomial is also satisfied by 0. But a polynomial of degree q cannot have 
more than q roots; so the elements of F are all the roots, and F is a splitting field. 


Now Step 2 shows the uniqueness of the field with q elements, if it exists. 
Step 6. If ¢ is a power of the prime p, then the splitting field of x? — x over Z/(p) has q elements. 


The derivative of the polynomial x3 — x is —1 (remember that the characteristic divides q); this 
is coprime to x! — z, so all the roots of the polynomial z? — z in its splitting field are distinct, so 
there are q of them. We have to show that these roots form a field. So let S be the set of roots, and 
a,b E S; that is, a? = a and b$ = b. Then 


(a +6)? =a? +f -a +8, 
(ab)? = a%b? = ab, 


so a +b,ab € S; similarly 1/a € S if a Æ 0. (The first equation above is non-trivial. We have 


p 

P — P\ p-ipi _ op 

(a+ By => (Pe Baa? +h, 
#=0 

since the characteristic is p and divides all the binomial coefficients (f) for 1 < i < p — 1. Then, by 

induction on k, 


(a +6)?" =a? +P 


and the result follows since ¢ is a power of p (Fact 4).) So S is a field of order ¢, completing the 
proof. 


9.10. Exercises 


9.10. Exercises 


1. How many additions and multiplications are needed (in the worst case) to 
transform an m x n matrix into reduced echelon form? 


2. For fixed g, show that the probability that a random n x n matrix over GF(q) is 
non-singular tends to a limit ¢(g) as n — 00, where 0 < e(q) < 1. 


3. Let F,(n) be the total number of subspaces of an n-dimensional vector space 
over GF(q). Prove that F,(0) = 1, F,(1) = 2, and 


F,(n +1) = 2F,(n) + (q" — DFy(n — 1) 
for n > 1. [HINT: By (9.2.3) and (9.2.4), we have 


Peeled bl eel, 
Now sum over k) 


Prove that F,(n) > gr, 


4. Prove 41 n 
n i ati-k 
=f ee." 
| k I i 9 k ~ l 4 


in two ways: by using (9.2.3) and (9.2.4), or by dividing the k x (n + 1) matrices into 
two classes according to their first column. 


5. Prove that the right-hand side of the q-binomial theorem (9.2.5) for ¢ = 1 counts 
the number of n x n matrices in echelon form over GF(q), that is, satisfying the first 
two conditions in the definition of reduced echelon form. How many n x n matrices 
in reduced echelon form are there? 


6. Prove that a set of points of a projective space is a flat if and only if it contains 
the line through any two of its points. [The corresponding set of vectors of the 
vector space is closed under scalar multiplication, since it is a union of 1-dimensional 
subspaces. So you must show that the set of vectors is closed under addition if and 
only if the set of points contains the line through any two of its points.] 


7, Show that any set of m — 2 MOLS of order m can be enlarged to a set ofm—-1 
MOLS. [Hint: Construct the net corresponding to the given MOLS, Show that its 
points fall into m sets of m pairwise non-collinear points; these sets comprise the 
‘missing’ parallel class.] 

REMARK. R. H. Bruck generalised this result; he showed that any set of m — f (m) 
MOLS of order m can be enlarged to a set of m — 1 MOLS, where f(m) is a 
function of magnitude roughly m!⁄4, 

8. Show that there are two non-isomorphic nets of order 4 and degree 3. (The 


corresponding Latin squares are the multiplication tables of the two groups of order 
4.) Show that one, but not the other, can be enlarged to an affine plane. 


9. (a) Prove that there is a unique projective plane of order 3. 
(b) Prove that there is a unique projective plane of order 4. 
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10. Let © be an oval in a projective plane of even order q. Prove that the tangents 
to O all pass through a common point p, and that © U {p} is a set of g + 2 points 
which meets every line in either 0 or 2 points. (Such a set is called a hyperoval, Note 
that, if any one of its points is omitted, the resulting set is an oval.) [HINT: Let z; 
be the number of points not on © which lie on i tangents. Show that zo = 0, and 
calculate 37(¢ — 1)(¢ — (q + 1))a:.] 


11. Prove that, if q is a prime power, then any five points of PG(2,¢), such that 
no three of them are collinear, are contained in a unique conic. Deduce that the 
number of conics is 

(g +1+41)4¢(q-1). 
12. Define a geometry as follows. The points are to be all the 2-element subsets 
of {1,2,3, 4, 5,6}; the lines are all the disjoint triples of 2-subsets. Prove that the 
geometry is a generalised quadrangle with s = t = 2, a = 1. 
13. Let p be an odd prime. Show that half the non-zero congruence classes mod 
p are quadratic residues and half are non-residues, and that the product of two 
non-residues is a residue. [Hint: Any non-zero element of Z/(p) has 0 or 2 square 
roots in Z/(p). Further, multiplying by a fixed non-residue is one-to-one and maps 
residues to non-residues.] 


14, Write a quaternion formally as a + x, where a is a real number and x a 
3-dimensional vector (relative to the standard basis (i,j, k)). Show that 
(a+x)+(b+y)=(a+8)+(x+y), 
(a+ x)-(b+y) =(ab—x.y) + (ay +x +x xy), 


where x.y and x x y are the usual scalar and vector products (‘dot product’ and 
‘cross product’) of vectors. 


Ramsey’s Theorem 


Complete disorder is impossible 
T. S. Motzkin (attr.) 


Topics: Pigeonhole Principle; Ramsey’s Theorem; estimates for 
Ramsey numbers; applications 


TECHNIQUES: Double induction; probabilistic existence proof 
ALGORITHMS: 
CROSS-REFERENCES: 


In 1930, F. P. Ramsey! proved a lemma in a paper on mathematical logic. The lemma 
has proved to be of greater importance than the theorem it was used to prove,’ and 
has given its author’s name to an area where combinatorics, logic, topology and 
probability interact. Roughly speaking, a theorem of Ramsey theory says that any 
structure of a certain type, no matter how ‘disordered’, contains a much more highly 
ordered substructure of the same type. 

Several mathematicians (notably Hilbert, Schur and van der Waerden) had 
before 1930 proved theorems which are now regarded as part of Ramsey theory. As 
Kafka in Borges’ essay,” Ramsey created his own predecessors; with the hindsight of 
Ramsey’s Theorem, we can see that these independent results are closely connected. 


10.1. The Pigeonhole Principle 


The Pigeonhole Principle is, at first sight, not the kind of thing that you would 
expect to be discovered by (and named after) a mathematician. In its simplest form, 


it is rather obvious: 


(10.1.1) Pigeonhole Principle 
If n + 1 letters are placed in n. pigeonholes, then some pigeonhole 
must contain more than one letter. 


1 Ramsey was a brilliant economist in the circle of Keynes. Though he died at the age of 29, he had 
already made notable contributions to this discipline. He was an atheist, but his younger brother 


became Archbishop of Canterbury. 
2 This theorem concerned what are now called ‘indiscernible sequences’. 
3 Jorge Luis Borges, ‘Kafka and his precursors’, Labyrinths (1964) 
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We will see, however, that it can be quantified and generalised into some highly 
non-trivial mathematics. In any event, it is clear that it is a ‘combinatorial’ result. 
It bears the name of the nineteenth-century German algebraist Dirichlet. He was 
surely not the first person to discover it, but the first to make effective use of it, as 
we will soon see. (By the way, can you give a formal proof?) 

Even in the basic form above, it has many applications. One of these (ordering 
elements in a rectangular array} is given as Exercise 1. Here is the application which 
Dirichlet made, and resulted in his name being attached to the principle. It concerns 
the existence of good rational approximations to an irrational number. The topic 
really belongs to Number Theory, but the argument is combinatorial. 


(10.1.2) Proposition. Let œ be an irrational number. Then there are infinitely many 
different rational numbers p/q for which 
1 
a — P < > 

q q 
Proor. For this proof, we let {x} denote the fractional part of the real number z, 
that is, {x} = z — [2]. 

Our strategy is to show: 


For any natural number n, there is a rational number p/q with 
q <n such that |a — p/q| < 1/(nq). 


Of course, we then bave |a — p/q| < 1/(4°). Moreover, since æ is irrational, a £ p/g, 
and we can find nı with |a — p/q| > 1/m,. Then repeating the argument with 
nı in place of n gives another solution p:/q, which is different from p/q (since 
la—~mf/a| < 1/(mq) < lfm < lo — p/g}). Continuing this process, we find 
infinitely many such ‘good’ rational approximations. 

Consider the n + 1 numbers {ia}, for i = 1,2,...,2 +1. We put these numbers 
into the n pigeonholes (j/n,(j + 1)/n), for j =0,...,2— 1. (None of the numbers 
coincides with an end-point of the intervals, since a is irrational.) By the Pigeonhole 
Principle, some interval contains more than one of the numbers, say {io} and 
{iz}, which therefore differ by less than 1/n. Putting q = |i, —7z2|, we see that there 
exists an integer p such that 


1 
lga — pl < =, 
n 


from which the result follows on division by n. Moreover, g is the difference between 
two integers in the range 1,... n +1, 50g <7. 


Instead of pigeonholes, we use the terminology of colouring. The Pigeonhole 
Principle states that, if r +1 objects are coloured with r different colours, then there 
must be two objects with the same colour. In order to move towards Ramsey's 
Theorem, we quantify the result further as follows. 


(10.1.3) Proposition. Suppose that n > 1+4r(i—1). Let n objects be coloured with r 
different colours; then there exist | objects all with the same colour. Moreover, the 
inequality is best possible. 
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Proor. If the conclusion is false, then there are at most / — 1 objects of each colour, 
hence at most r(/ — 1) altogether, contrary to assumption. 

When we say that the result is best possible, what we mean is this. If fewer than 
1+r(l— 1) objects are given, then there is some way of colouring them such that no 
l have the same colour. This too is obvious: ‘fewer than 1 + r(/— 1)’ means ‘at most 
r(l — 1}, and the objects can be divided into r groups with at most / — 1 in each 
group. 

Still more generally, suppose that n > kı +... + kr -r +l; let the points of an 
n-set be coloured with r colours c,,...,¢,- Then, for some value of 7 in the range 
1,...,1, there exist k; points all having colour 7; and this is best possible. 


10.2. Some special cases 


We now consider the two-player game introduced in Chapter 1. 

Mark six points on the paper, no three in line (for example, the vertices of a 
regular hexagon). Now the players take turns. On each player’s turn, he draws 
a Kine in his colour between two of the points which haven't already been joined. 
(Crossings of lines other than at marked points are not significant.) The first player 
to create a triangle with all sides of his colour, having three of the marked points 
as vertices, loses. 

The game is finite, since at most () = 15 edges can be drawn. If you play it 
with a friend, you will notice that it always ends in a win for one player; a draw is 
not possible. We prove that this is necessarily so. 


(10.2.1) Proposition. Suppose that the 2-element subsets of a 6-elemeni set are 
coloured with two colours. Then there is a 3-element set, all of whose 2-element sets 
have the same colour. This is not true for fewer than six points. 


PRoor. Let us suppose that the colours are red and blue; let 1,...,6 be the points. 
Consider the five 2-subsets 16, 26, 36, 46, 56. These are coloured with two colours; so 
there must be three of the five edges which have the same colour (by the Pigeonhole 
Principle with r = 2, l = 3). Let us suppose that 16, 26, and 36 are red. Now there 
are two possibilities: if any one of 12, 23, 31 is red (say 12), there is a red triangle 
(126); but if none of the three is red, then 123 is a blue triangle. 


To show that six is best possible, we must colour the 2-subsets of a 5-set red 


and blue without creating a monocromatic (single-coloured) triangle. If the points 
are 1, 2, 3, 4, 5, let 12, 23, 34, 45, 51 be red and 13, 24, 35, 41, 52 blue. 


Here are some more results of the same type. 


(10.2.2) Proposition. (i) If the 2-subsets of a 9-set are coloured red and blue, there is 

either a red 3-set or a blue 4-set. 

fii) If the 2-subsets of a 18-set are coloured red and blue, there is a monochromatic 
4-set. 

(iii) If the 2-subsets of a 17-set are coloured red, blue and green, there is a monochro- 
matic 3-set. 

(iv) All the above are best possible. 
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e pattern, except for one trick in the proof of 
(i). We prove (i) first for 10 points. Consider the nine edges joining one point x 
to the others. By the ‘more general’ form of the Pigeonhole Principle, either there 
are four red edges, or six blue edges. Suppose first that there are four red edges; 
let X be the set of their four endpoints other than +- If X contains a red edge yz, 
then zyz is a red triangle; else X is a blue 4-set. Now consider the other case, six 
blue edges; let Y be the set of their endpoints other than z. Now we use the result 
proved above, that Y contains a monochromatic triangle uvw. If it is red, we are 
done; if blue, then zuvw is a blue 4-set. 

Now suppose there are just nine points. The only way we can avoid the above 
situation is that every point x lies on exactly three red and five blue edges. But this 


f Chapter 2. (Could there be nine people at 


contradicts the Handshaking Lemma o 
a convention, each of whom shakes hands exactly three times?) So the result holds 


for 9 points too. 

{ii) Take a set of 18 points and colour the edges. Any point z lies on 17 edges; 
by the Pigeonhole Principle, either 9 are red or 9 are blue. Assume the former. By 
(i), the endpoints of these 9 edges either contain a red triangle (giving a red 4-set 
with z), or a blue 4-set (and we are finished). 

(iii) Now take 17 points and colour the edges with three colours, red, blue and 
green. A point z is joined to 16 others, so by PP six of them have the same colour, 
say green. If the set X of endpoints of these edges contains a green edge yz, we 
have a green triangle ty; otherwise all edges within X are red and blue, and there 
is a red or blue triangle by our earlier result. 

The fact that these are best possible requires construction of colourings with 8, 
17 and 16 points, not having monochromatic subsets of the specified sizes. This can 
be done, but I don’t give details here (but see Exercise 6). 


10.3. Ramsey’s Theorem 
er of proof suggest their generalisation, which is 


Proor. The proofs all follow the sam 


The results above and their mann 
known as Ramsey’s Theorem. 


(10.3.1) Ramsey’s Theorem 
Let r,k,/ be given positive integers. Then there is a positive integer 
n with the following property. If the k-subsets of an n-set are 
coloured with r colours, then there is a monochromatic l-set, ie, 
one all of whose k-sets have the same colour. 


More generally, let r, k, a&i,- -xar be given. Then there exists n with the property 
that, if the k-subsets of an n-set are coloured with r colours c1,- . - , ĉr, then for some 
i in the range 1,...,7, there is an aset, all of whose subsets have colour cj. 

We denote by R(r,k,!) the smallest n for which Ramsey’s Theorem holds, and 
by Ro(r, kja.. a,) the smallest n for which the ‘more general’ statement holds. 
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Clearly we have R(r, k, I) = R*(r,k 
k, D= „k; l,l, ...,4). To familiari i 
we proved the following results: R(2,2,3) } 6; R(2 DID ao Re De is 
(3.23) = 17; and D »2;3,4) = 9; R(2,2,4) = 18; 


R'(r,lja1,...,@,) = Soa; —r 41. 
Moreover, there are some trivi i 7 
ial evaluations: R(r,k,k) = k, R = 
always assume that k < l, or that all of a,... a $ at le ar BD = i. (We 
assertions are trivial.) Si ast k, otherwise the 
It is also true that 


R(r+1jkja,...,a,,k) = R*(r, k; @,...,8,). 


For, if there is a k-set of 
- colour . : 
occur. Cr41, we have won; otherwise, only the first r colours 


2 
arguimenne for a Ramsey s Theorem uses induction, similar to the examples. As the 
peguments for i) on (ii) suggest, we prove the ‘more general’ assertion. We alread 
1 By inact or k = 1, so assume that k > 1. We may assume that a; > k fi A 
. ion, we may assume that the numbers oe 


Ai = R (r, kyay,...,ai-1,45 — 1, Gigi, ..-, ae) 
are defined (and the statement is true for these). 


Take n = 1 " — l: 

are nai Lie k 1; Ai,...,A,). Let X be a set of n points, whose k-sets 
We define a uina oan €1,-.-,¢,. Take a point z € X, and let Y = X \ {z} 
rule that, for any (k 8 e the (k — 1)-subsets of Y with colours c*,...,c*, by the 
of UU ta} o A , prsubset U, the colour of U is ct if and only if the colour 
of size A.B HA e nition of n, for some i, there is a c*-monochromatic set Z; 
k-sets o f col we. fe o Ai the set Z; contains either a set of size a; with all its 
colour eI the or some j # 7%, or a set V of size a; — 1 with all its k-sets of 
a; and mn its he cases we have won. In the second case, {x} U V is a set of size 
and by the definiti ave colour c — by assumption for subsets not containing z 

l =; ion of the c*-colouring and the fact that all (k — 1)-sub Be 
colour cj in the case of subsets containing z. subsets have 


10.4. Bounds for Ramsey numbers 


i, is extremely difficult to calculate exact values of Ramsey numbers. Apart fr 
the © lues given in the last section, only four values are known precisely E you 
weakness t i jali iding 

eo a na on the part of combinatorialists, try deciding any of 

e Is R*(2,2; 3,8) equal to 28 or 29? 

e ls R”(2,2; 4,5) equal to 25, 26, 27 or 28? 

s s R2 R4) = R2344) equal to 13, 14 or 15? 

e absence of exact values, we rel, i iti 
y on inequalities, u 
I stress that upper bounds come from the proof of Remsey’s Theres pounds 
a re- 


finement of it, and ] 
a lower bounds f i : ; 
monochromatic sets. rom constructions of colourings without large 


152 10. Ramsey's Theorem 


The proof of Ramsey’s Theorem in the last section gives us a ‘recurrence 
inequality’ for the Ramsey numbers, viz. 


RY(r, ky 01,..650r) < 1+ Br, k- 1; Aree Ar), 
where 
A; = R* (r, k; i, -3 @i-1, 8i — 1, titie t) 


In general, this is a very tangled web which is difficult to disentangle into explicit 
bounds. We consider one case where this can be done. 


(10.4.1) Proposition. If ¢1,a2 2 2, then 


. ar + a2 —2 
R"(2, 2; 41,02) < ( a1 ). 


Proor. If a, = 2, then 


2+a,—-—2 
R222) = R(%an) a= (74%, "), 


and the result is true; similarly, if a, = 2. So we will use induction, assuming the 
result is true when either a, or az is reduced. In the notation of Ramsey’s Theorem, 


a ta:— 3 
a, —2 , 


ai +a:— 3 
a—-l1 , 


A = R*(2,2;@ - 1, a2) < ( 
Ag = R*(2, 2; a1, 42 — 1) < ( 


where the inequalities are the inductive hypothesis; so 


R (2,2; a1, a2) < 1+ R*(2,1; Aa, A) 
= A; + Ao 


< a +a- 3 4 atar?) 
~ a, —2 ay — 1 

_ a; +a2,-2 

~ a, —1 , 


where the second line comes from the Pigeonhole Principle (the case k = 1 of 
Ramsey’s Theorem) and the last is the standard binomial coefficient identity (5) + 


(3) = CF). 


21-2 
(10.4.2) Corollary. R(2,2,!) < (7. 1 ). 


10.4. Bounds for Ramsey numbers 


Proor. R(2,2,!) = R*(2, 2; 2,1) by definition. 


The right-hand side here is less than 27-? = 4'-1, since the sum of all binomial 
coefficients (7,7) is equal to 27-7, Moreover, it is larger than 4!~!/(2/ — 1), since 


there are 2/—1 of these binomial coefficients, and the middle one CY) is the largest. 


So the upper bound grows exponentially with constant 4. We conclude this section 
by proving a lower bound for this Ramsey number, which is also exponential, but 
with the smaller constant \/2. (The true order of magnitude is not known.) The 
proof uses an important combinatorial technique known as the Probabilistic Method. 


(10.4.3) Proposition. R(2,2, 1) > 2-2)”, 


Proof. Let X be a set of n points; the size of n will be specified later. We consider 
all possible colourings of the 2-subsets of X with two colours (red and blue, say). 
Since there are G = n(n — 1)/2 pairs, there are 2°"-1)/? such colourings. 

How many of these colourings contain a monochromatic l-subset? There are 

7) choices of an l-set L. For each choice, L is monochromatic in a proportion 

gfoii-V/2 = 2t-l-1/2 of all the colourings; for, of the 2-)/? ways in which 
the colours could fall on the 2-subsets of L, only two are monochromatic. So 
the number of colourings which contain a monochromatic /-set does not exceed a 
fraction (7) 21-U-)/2 of the total. (The number could in principle be calculated 
exactly, using PIE; but this bound is good enough.) 

Now suppose that n = [2-?)/2]. Then 


(7) gi-MlH1)/2 < pla tt-2)/2 
<l, 


the first inequality holding since 0) < ni and 1 — (1 —1)/2 < —I(i — 2)/2, and 
the second by definition of n. In other words, the proportion of colourings having 
a monochromatic /-set is strictly less than 1. This means that there exists some 
colouring which has no monochromatic l-set. Hence R(2,2,/) > n = [2-7/7], 
whence R(2,2,!) > 2('-?)/2, as required. 


The argument can be re-phrased as follows. Instead of considering the set of 
all colourings, and calculating the proportion that have a monochromatic n-set, we 
can instead speak of the probability that a random colouring has a monochromatic 
l-set. This probability p is bounded by the expected number of monochromatic [-sets 
in a random colouring, which is equal to (rota 2 (the number of l-sets times 
the probability that a given l-set is monochromatic). No mention of inclusion and 
exclusion is required. It is this interpretation which led to the term ‘probabilistic 
method’ for this type of argument. 

In more detail: 

Colour at random the set of all 2-subsets of the given n-set X, where each set has probability 
1/2 of being red and 1/2 of being blue, with decisions about different sets independent. Now consider 
any i-set Y. It has (G) = l(} — 2)/2 subsets of size 2. The probability that all are red is 2--1)/2 
with the same probability that all are blue; so the probability that Y is monochromatic is twice this 
number, or 2!~4-1)/2, 
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The expected number of monochromatic l-sets is equal to this probability multiplied by the total 


-t(le {2 
ber of l-subsets, hence (")2!~@ 1/2, , 
mee n and l are ‘such au this expected value is leas than one, then it cannot occur that there 


is at least one monochromatic set in every colouring; hence there exists a colouring containing no 
monochromatic l-set. 

However the argument is phrased, note that it is a non-constructive existence 
proof: it shows that there must be a way of doing the colouring so that no 
monochromatic l-sets are created, but it gives us absolutely no indication of how to 
find one (except, possibly, choosing the colouring at random and trusting to luck). 
It is generally regarded as ‘þetter’ to have an explicit construction of an object, in 
such a way that it is possible to verify directly that it has the required properties, 
than to have only an existence proof. 


10.5. Applications 
Here are some applications of Ramsey's Theorem. In the first case, there is a 
beautiful direct argument giving the exact bound. 


(10.5.1) Proposition. There is a function f(m,n) with the following property: 
If 21,22,...,2n is any sequence of distinct real numbers with N > f(m,n), 
then there is either a monotonic increasing sequence of length greater than m, 
or a monotone decreasing sequence of length greater than n. 


Here is the proof using Ramsey’s Theorem. We take f(n,m) = R (2,2m + 
1,n +1)— 1. Suppose that N > f(m,n), and we are given a sequence of N distinct 
real numbers. Take X = {1,...,N}, and colour the 2-subsets of X as follows: 
given a 2-set {i,j}, with è < j, colour it red if z; < 2;, blue if z; > zj. Since 
|X| > R*(2,2;m + 1,n + 1), there is either a red (m + 1)-set or a blue (n + 1)-set. 
But a red set indexes a monotone increasing subsequence; for if ny < n2 <... and 


all edges are ted, hen 2n, < In, £.. Stlely a hue seh WO CARTER, 


subsequence. 


Now here is the elegant direct proof, due to Erdős and Szekeres. We take the 
function f(m,n) to be simply mn. So suppose that we have a sequence of mn + 1 
distinct real numbers, and suppose that it contains no monotone increasing sequence 
of length m + 1 or greater. For i = 1,...,m, let 


K; = {k: the longest monotone increasing sequence ending at z; has length i}. 


Now we have partitioned the set {1,2,... mn + 1} into m subsets Ky,...,Km. By 
the Pigeonhole Principle, some one of these sets, say K,, contains at least n + 1 
members. 

Now we claim that K; indexes a monotone decreasing subsequence. For suppose 
that k,l € K; with k < l and 2, < zı. Now, by definition of K;, there is a monotone 
increasing sequence of length i ending at k, say Ti < £j, <<... < £k. But then 
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is a monotone increasing sequence of length i + 1 ending at z:, contradicting the 
fact that l € K;. This claim establishes the result. 


10.6. The infinite version 
The bound f(m,n) = mn is best possible. For consider the mn numbers 
n—1,2n-1,...,mn—1,n—2,2n—2,...,mn—2,...,0,n,...,(m—L)n. 


It is not hard to check that the longest increasing subsequence has length m, and 
the longest decreasing subsequence has length n. 


Another application is due to Erdés and Szekeres. A set of points in the 
Euclidean plane is convex if it contains the line segment joining any two of its 
points. The convex hull of a set S of points is the smallest convex set containing 
S. It can also be described as the set of linear combinations of points in S$, where 
the coefficients in the linear combination are restricted to being non-negative and 
having sum 1. A convex polygon is a finite set of points, none of which lies in the 
convex hull of the others. Another description is that each of the points lies on a 
line with the property that all the other points are on the same side of the line. 


(10.5.2) Proposition. There is a function f such that, given any f(n) points in the 
plane with no three collinear, some set of n of the points form a convex polygon. 


Proor. We need two preliminary facts: 


Fact 1. Given any five points in the plane, no three collinear, some four of the 
points form a convex quadrilateral.’ 

This is clear if the convex hull of the points is a pentagon or quadrilateral. So 
suppose that it is a triangle, with vertices A,B,C, and let D and E be the remaining 
points. Then the line DE meets two sides of the triangle, say AB and AC; and the 
quadrilateral BC DE is convex. 


Fact 2. Given a set of n points in the plane, if every four points form a convex 
quadrilateral, then all n points form a convex polygon. 
The proof is an exercise. 


Now let f(n) = R*(2,4;5,7). Given f(n) points in the plane, colour a 4-set red 
if it is a convex quadrilateral, blue otherwise. By Fact 1, there is no blue 5-set. So 
there is a red n-set; and, by Fact 2, it is a convex polygon with n points. 


The exact value of the function f(n) is unknown. 


10.6. The infinite version 


As our very last item, we mention without proof the infinite version of Ramsey's 
Theorem. As usual, the prototype is the Pigeonhole Principle: 
If the elements of an infinite set are coloured with finitely many colours, then 
there is an infinite monochromatic subset. 
Ramsey’s theorem generalises to colourings of the k-subsets of an infinite set with 
finitely many colours: 


4 This special case of (10.5.2), due to Esther Klein, was the inspiration for the general result, which 
involved an independent discovery of Ramsey's Theorem by Erdős and Szekeres. See the comments 
by Szekeres in the introduction to the volume of selected papers by Paul Erdés, The Art of Counting 
(1973). 
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(10.6.1) Ramsey’s Theorem (infinite form) 
Let X be an infinite set, and k and r positive integers. Suppose 
that the k subsets of X are coloured with r colours. Then there 
is an infinite subset Y of X, all of whose k-subsets have the same 
colour. 


We will discuss this result, and various extensions of it, in Section 19.4. 


A remarkable recent discovery in logic is that it is possible to deduce the 
finite form of Ramsey’s theorem from the infinite, but not vice versa. This fact 
has important spin-offs in logic, notably a variant of the finite form (the ‘Paris— 
Harrington Theorem’) which is true but not provable from the axioms for the natural 
numbers (essentially because the ‘Paris-Harrington numbers’ grow so fast that they 
are not provably computable). But we cannot follow this any further. 


10.7. Exercises 


1. A platoon of soldiers (all of different heights) is in rectangular formation on a 
parade ground. The sergeant rearranges the soldiers in each row of the rectangle 
in decreasing order of height. He then rearranges the soldiers in each column in 
decreasing order of height. Using the Pigeonhole Principle, prove that it is not 
necessary to rearrange the rows again; that is, the rows are still in decreasing order 
of height. 


9. Show that any finite graph contains two vertices lying on the same number of 
edges. 


3. (a) Show that, given five points in the plane with no three collinear, the number 
of convex quadrilaterals formed by these points is odd. 
(b) Prove Fact 2 in the proof of (10.6.2). 


4. Show that, if N > mnp, then any sequence of N real numbers must contain either 
a strictly increasing subsequence with length greater than m, a strictly decreasing 
subsequence with length greater than n, or a constant subsequence of length greater 
than p. Show also that this result is best possible. 


5. (a) Show that any infinite sequence of real numbers contains an infinite subse- 
quence which is either constant or strictly monotonic. 

(b) Using the Principle of the Supremum," prove that every increasing sequence 
of real numbers which is bounded above is convergent. 

(c) Hence prove the Bolzano- Weierstrass Theorem: Every bounded sequence of 
real numbers has a convergent subsequence. 
— o 
$ The Principle of the Supremum is the basic principle expressing the completeness of the real 
number system, It asserts that, if a non-empty set of real numbers has an upper bound, then it has 
a supremum or least upper bound. 
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6. Let X be the set of residues modulo 17. Colour the 2-element subsets of X by 
assigning to {z,y} the colour red if 


z—y =+1,42,44 or £8 (mod 17), 


blue otherwise. Show that there is no monochromatic 4-set. [HINT: By symmetry, 
we may assume that the 4-set contains 0 and 1; this greatly reduces the number of 
cases to be considered!] 


7, (a) Prove the following theorem of Schur: 


Schur’s Theorem 
There is a function f on the natural numbers with the property 
that, if the numbers {1,2,...,f(n)} are partitioned into n classes 
then there are two numbers z and y such that x,y and z+ y ail 
belong to the same class. 


(In pher words, the numbers {1,2,..., f(n)} cannot be partitioned into n ‘sum-free 
sets’. 
(Hint: Colour the 2-subsets of {1,2,...,N +1} with n colours, according to the 
rule that {z, y} has the i** colour if |z — y| belongs to the i‘" class (where N is some 
suitable, sufficiently large, integer).] 

(b) State and prove an infinite version of Schur’s Theorem. 


8. A delta-system is a family of sets whose pairwise intersections are all equal. (So, 
for example, a family of pairwise disjoint sets is a delta-system.) Prove the existence 
of a function f of two variables such that any family F of at least f(n, k) sets of 
iw, Co n contains k sets forming a delta-system. 
Hint: Construct a sequence of sets A1, Ao in F, and as 
pees equ F wee 
subfamilies, such that quence Fay Fay. of 
e F; 2 Figs for all i; 
e ANA = AN A for all A, ÆA € Fi; 
e A; € Ff; for all j >i 
Show that 
e the sequence can be continued for m terms if F is sufficiently large (in terms of 
m and n); 
o if the sequence continues for (k — 1)(n + 1) +1 terms, then some k of the sets 
A, form a delta-system.] 
State and prove an infinite version of this theorem. 


Do you regard this theorem as part of ‘Ramsey theory’? 


9. Why are constructive existence proofs more satisfactory than non-constructive 
ones? 
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Only connect! 


E. M. Forster, Howards End (1910) 


Topics: Graph properties related to paths and cycles, especially 
trees, Eulerian and Hamiltonian graphs; networks, Max-Flow Min- 
Cut and related theorems; [Moore graphs] 


TECHNIQUES: Algorithmic proofs; approximate solutions; [Eigen- 
value techniques] 


ALGORITHMS: Graph algorithms; greedy algorithm; stepwise im- 
provement 


CROSS-REFERENCES: Trees (Chapter 4); Hall’s Marriage Theorem 
(Chapter 6); [de Bruijn—Erdés theorem (Chapter 7)] 


We have met graphs several times before, in various guises. Now, we return to 
them, and consider them more systematically. Graphs describe the connectedness 
of systems; typically, they model transport or communication systems, electrical 
networks, etc. In this chapter, we concentrate on issues related to this aspect. In 
Part 2, we return to graphs and Jook at colouring problems. 

Graph theory is a cuckoo in the combinatorial nest;' it has grown to the 


status of an independent discipline, though still closely linked with other parts of 
combinatorics. 


11.1. Definitions 


We have defined a graph to consist of a set V of vertices equipped with a set E of 
2-subsets of V called edges. Sometimes it is necessary to broaden the definition.’ 
In particular, we may want to allow loops, which are edges joining vertices to 
themselves; multiple edges, more than one edge between the same pair of vertices; 
and directed edges, which have an orientation so that they go from one vertex to 


Lm: . . 
This comment is not a disparagement. Graph theory has been successful because it provides 
mathematicians with a large supply of interesting problems, many of them related to applications. 


2 Where necessary to avoid confusion, the structure just defined is called a simple graph. 
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another? The exact details of the formal mathematical machinery needed to define 
all these concepts is not too important; just note that directed edges are easily 
represented as ordered pairs rather than 2-subsets of vertices. A graph with some 
or all of these extended features is called a general graph; in particular, if it has 
directed edges, it is a directed graph or digraph, and if it has multiple edges, it is a 
muitigraph. 

Most of these concepts can be expressed in the language of relations introduced 
in Section 3.8. Since knowing a graph involves knowing which pairs of vertices 
are adjacent, we can regard a graph as a binary ‘adjacency’ relation on the vertex 
set. For a simple graph, adjacency is irreflexive and symmetric; relaxing these two 
conditions allows loops and directed edges respectively. However, multiple edges 
cannot easily be described in this language. 

For the most part, we consider only undirected graphs without loops; but we 
sometimes need to allow multiple edges. The exception is Section 11.9; a network is 
most naturally based on a directed graph. 

In a simple graph, we say that vertices z and y are adjacent if {z,y} is an edge; 
they are non-adjacent otherwise. 


We write G = (V, E) for a graph G with vertex set V and edge set E. 


Two simple but important kinds of graphs are complete graphs, in which every 
pair of vertices is an edge; and null graphs, having no edges at all. The complete and 
null graphs on n vertices are denoted by K, and Nn respectively. Other important 
graphs will appear from time to time. 


A subgraph of a graph G = (V, E) is a graph whose vertex and edge sets are 
subsets of those of G. Note that, if G’ = (V’, E") is a subgraph of G, then for every 
edge e € F’, it must hold that both the vertices of e lie in V’. 

Two kinds of subgraphs are of particular importance. An induced subgraph of 
G is a subgraph G’ = (V’, E’) whose edge set consists of all the edges of G which 
have both ends in V’. A spanning subgraph is one whose vertex set is the same as 
that of G. Thus, for example, every graph with at most n vertices is a subgraph of 
K,,, and every graph with exactly n vertices is a spanning subgraph; but the only 
induced subgraphs of Kn are complete graphs. 

An induced subgraph is specified by giving its vertex set V’; we speak of the 
subgraph induced on the set v’. 

Now we have to consider various kinds of routes in graphs. There are several 
different terms to be defined here; the differences are not very important, as you will 
see. My terminology is slightly different from the standard. 

A walkin a graph is a sequence 


(vo, El, V1, C2, U2, +++ Ons Va) 


where e; is the edge {v;-1,¥:} for i =1,.--,7 We say that it is a walk from vo to Un. 
The length of the walk is the number 7 of edges in the sequence (or one less than 


3 Directed edges could arise in modelling traffic flow in a town with some one-way streets, for 
example. 
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the number of vertices). It is closed if n > 0 and vn = vo. Note that there are no 
restrictions; when walking, we may retrace our steps arbitrarily. 

In a simple graph, the edges in a walk are uniquely determined by the vertices; 
so we often speak of the walk (vo, %1,- -- , Un), defined by the condition that v;_,; and 
vi are adjacent for 7 = 1,...,7. 

We define special kinds of walks: treks, trails, and paths. A trek is a walk in 
which any two consecutive edges are distinct;’ if it is closed, we also require that the 
first and last edge are distinct. Thus, a trek is a bit more purposeful than a walk: 
we never retrace the edge we have just used. The last condition ensures that, in a 
closed trek, we can start at any point and the result is still a trek. 

A trail is a walk with all its edges distinct; a path is a walk with all its vertices 
distinct (except perhaps the first and the last). The idea is that a trail might be 
followed by an explorer, who is not interested in revisiting an edge he has once 
explored; while a path proceeds efficiently from one place to another without any 
repetition. Further, we define a circuit to be a closed path. 

Note that these concepts get progressively stronger; a path is a trail is a trek. 
However, from the point of view of connections, there is no essential difference: 


(11.1.1) Proposition. (a) For any distinct vertices z, y of a graph G, the conditions 
that there exists a walk, trek, trail or path from z to y are all equivalent. 

(b) For any graph G, the conditions that G contains a closed trek, trail or path 
are all equivalent. 


PROOF. Given a walk from z to y, if it is not a trek, then some two consecutive 
edges are repeated, so that there is a subsequence (v,e,v',e,v). Replacing this by 
the single vertex v gives a shorter walk. The process terminates in a trek from z to 
y. 

Now a trek with a repeated edge must have a repeated vertex; so it suffices to 
show that, if there is a trek from z to y (with possibly z = y), then there is a path. If 
the vertex v is repeated (but not as the first and last vertex), there is a subsequence 
(v,...,v), which can be replaced by a single v to obtain a shorter trek. Continuing 
this process produces a path. Note that a closed trek cannot be reduced to the trek 
of length zero by this process. 


Now define a relation = on the vertex set V by the rule: z = y if there is a path 
(or trail, or trek, or walk) from z to y. We have: 


= is an equivalence relation on V. 


This is straightforward: there is a walk of length 0 from z to z; reversing a walk 
from z to y gives a walk from y to z; and following a walk from z to y with a walk 
from y to z gives a walk from z to z. (Note that the proof would be untidier if we 
used one of the more special types of walk.) 


4 A trek with s edges is called an s-arc in the graph-theoretic literature; but this does not convey the 
sense of being intermediate in purposiveness between an walk and a trail, and also could be confused 
with the use of ‘arc’ for an edge of a directed graph. 


5 Mnemonic; a term later in the dictionary describes a wider concept. 
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This equivalence relation, of course, defines a partition of the vertex set of G. 
We define the connected components (or, for short, the components) of G to be the 
subgraphs induced on the equivalence classes. Note that no edge joins points in 
different equivalence classes; so the edge set of G is partitioned into the edge sets 
of its components. 

A graph is connected if it has just one component. Note that any connected 
component of G is indeed a connected graph. 


The valency, or degree, of a vertex x of a graph G is the number of edges 
containing z.° In a directed graph, we have to distinguish between the out-valency 
of a vertex (the number of directed edges starting at that vertex) and the in-valency 
(the number of edges ending there). 

If every vertex of a graph has the same valency, the graph is called regular, and 
the common valency d is the valency of the graph. We call such a graph d-valent, 
and use the terms divalent, trivalent, etc. when d = 2,3, etc. 

Often we will modify a graph G by removing a vertex v and all edges containing 
it, or by removing an edge e, or by adding an edge e joining two vertices not 
previously joined. We use the shorthand notations G—v, G—e, Gte for the results 
of these operations. (The strictly correct set-theoretic notation would be much more 
cumbersome, and would depend on the precise kind of graph in question.) 

Sometimes our graphs will carry additional, numerical information: an edge 
may represent a pipeline, for example, and be labelled with its capacity, or the cost 
of building it. Formally, a weight function on a set X is a function from X to the 
non-negative real numbers. A vertex-weighted, resp. edge-weighted, graph is a graph 
with a weight function on the set of vertices, resp. edges. Edge-weighted graphs are 
more common, but we allow either or both types of weight function. 


11.2. Trees and forests 


A tree is a connected graph without circuits. We have met trees before, in Section 
3.10 (where we proved Cayley’s Theorem, that there are n -2 labelled trees on n 
vertices) and Section 4.7 (binary trees, in connection with searching and sorting). 


We might expect that a connected graph has ‘many’ edges, and a graph without 
circuits has ‘few. The next result shows that trees are extremal for both these 
properties. We need one piece of notation: a graph without circuits is called a forest 
— its connected components are trees! 


8 Both terms are commonly used. I prefer the first. The term ‘degree’ is over-used in mathematics, 
and there is no analogy between the degree of a graph and the degree of a polynomial, permutation 
group, ete. On the other hand, anyone who has studied chemistry will recognise the same concept. In 
the methane molecule CH4, the carbon atom has valency 4 and the hydrogen atoms have valency 1. 
The standard representation 


of the methane molecule shows the analogy clearly. 
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(11.2.1) Theorem. (2) A connected graph with n vertices has at least n — 1 edges, 
with equality if and only if it is a tree. 

(b) A forest with n vertices and m connected components has n — m edges. 
Thus, a forest has at most n — 1 edges, with equality if and only if it is a tree. 


PRroor. We show first that a tree has n — 1 edges. This is proved by induction; it is 
clear for n = 1. The inductive step depends on the following fact: 


A tree with more than one vertex has a vertex of valency 1. 


Since a tree is connected, it has no isolated vertices (if n > 1); so, arguing by 
contradiction, we can assume that every vertex has valency at least 2. But then there 
are arbitrarily long treks in the graph, since whenever we enter a vertex along one 
edge, we may leave along another. A trek of length greater than n must return to 
a vertex it has ‘visited previously; so there is a closed trek, and hence a circuit, and 
we have arrived at a contradiction. So the assertion is proved. 


Now let z be a vertex in the tree T which has valency 1. Let T — v denote the 
graph obtained by removing v and the unique edge incident with it. Then T — v has 
n—1 vertices, and contains no circuits. We claim that T — v is connected. This holds 
because a path in T between two vertices s,y # v cannot pass through v. Thus 
T—v is a tree. By the induction hypothesis, it has n — 2 edges; so T has n — 1 edges. 


Now (b) of the theorem follows easily. For let F be a forest with n vertices and 
m components T;,...,Tm, With a1,...,dm vertices respectively. Then 07,4 =n. 
Now T; is a tree, and so has a; — 1 edges. So F has 


m 


Pla- 1l)=n-m 


i=1 


edges. 

To prove (a), let G be any connected graph with n vertices, and suppose that 
G is not a tree. Then G contains a circuit C. Let e be an edge in this circuit, and 
Gi = G — e the graph obtained by removing e. Then G, is still connected. For, if 
a path from z to y uses the edge e, then there is a walk from z to y not using e. 
(Instead of using e, we traverse the circuit the other way.) Repeating this procedure, 
we must reach a tree after, say, r steps. Since r edges are removed, G has n— 1 +r 
edges altogether. 


Let G be a graph. A spanning forest is a spanning subgraph of G (consisting 
of all the vertices and some of the edges of G) which happens to be a forest. A 
spanning tree is similarly defined. 


(11.2.2) Corollary. Any connected graph has a spanning tree. 


This follows from the argument for part (a) of the theorem above; by removing 
edges from G, we can obtain a spanning tree. There is another way to proceed, 
which will be useful later; this involves building up the spanning tree ‘from below’. 
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(11.2.3) Spanning tree algorithm 
Let G = (V, E) be a connected graph. 
Set S = 6. 
WHILE the graph (V,S) is not connected, let e be an edge joining 
vertices in different components, and add e to the set S. 


RETURN (V, 5). 


To prove that this algorithm works, we have to show that the choice of e is 
always possible and its addition creates no circuit. Let Y be a connected component 
of (V, 5), and Z = V \ Y; choose vertices y,z in Y, Z respectively. In G, there is 
a path from y to z; some edge in this path must cross from Y to Z, and this is a 
suitable choice for e. Now suppose that (V,S) + e contains a circuit. If we start, say, 
in Y, and follow this circuit, at some moment we cross into Z by using the edge e; 
then there is no way to return to Y to complete the circuit without re-using e. 


We see.that there is a great deal of freedom in creating spanning trees. How 
many are there? Cayley’s Theorem (Section 3.10) can be stated in the form: 


(11.2.4) Cayley’s Theorem. The complete graph K, has n*~? spanning trees. 


For, obviously, any tree on the vertex set {1,...,n} is a spanning tree of the 
complete graph. 

There is a general technique for counting the spanning trees in an arbitrary 
graph, using the adjacency matrix of the graph. This is described in the chapter on 
graph spectra in Beineke and Wilson, Selected Topics in Graph Theory (1977). 


11.3. Minimal spanning trees 


Suppose that n towns are to be linked by a telecommunication network. For each 
pair of towns, the cost of installing a cable between these two towns is known. What 
is the most economical way of connecting all the towns? 

This is known as the minimal connector problem. The data can be regarded as an 
edge-weighted graph. (As described, the graph G in question is the complete graph; 
but this is not essential. We could suppose that, for various reasons, it is impossible 
of uneconomic to connect certain pairs of towns directly.) 

The solution to the problem will be that connected spanning subgraph H of the 
graph G of minimal total weight (that is, the sum of the weights of the edges of H 
is as small as possible), Clearly, H must be a tree; for, if not, then edges could be 
deleted, reducing the weight, without disconnecting it. The problem is solved by a 
simple-minded algorithm called the greedy algorithm. This says: at each stage, build 
the cheapest link which joins two towns not already connected by a path. Formally: 
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(11.3.1) Greedy algorithm for minimal connector 
Let G = (V,E) be a connected graph, w a non-negative weight 
function on E. 
Set S = 0. 
WHILE (V, S) is not connected, choose the edge e of minimal weight 
subject to joining vertices in different components. 


Return (V, $). 


This algorithm is just a specialisation of the spanning tree algorithm in the last 
section; so it does indeed produce a spanning tree. We have to show that this 
spanning tree has minimum weight. 

Let €1, €2,...,€n—1 be the edges in S, in the order in which the Greedy Algorithm 
chooses them. Note that 

w(e1) Z... < wen-1)s 
since if w(e;) < w(e;) for j > i, then at the ith stage, e; would join points in different 
components, and should have been chosen in preference to ¢;. 

Suppose, for a contradiction, that there is a spanning tree of smaller weight, 
with edges f1,..., fn-1, ordered so that 


w{ fi) s.. <S wt fa-1). 
Thus, 


nal nal 


D w(fi) < D w(e;). 


t=1 i=1 


Choose k as small as possible so that 


k k 
> w(fi) < Z wle). 
t=1 i=1 
Note that k > 1, since the greedy algorithm chooses first an edge of smallest weight. 


Then we have kd 


kel 
D w(fi) > J wle); 
t=1 i=1 
hence 
w(fi) <<... < wfr) < w(er). 

Now, at stage k, the greedy algorithm chooses ep, and not any of the edges fi,..., fk 
of strictly smaller weight; so all of these edges must fail the condition that they join 
points in different components of (V, $), where $ = fey,...,€k-1}- It follows that 
the connected components of (V, S’), where S’ = {fi,..., fx}, are subsets of those 
of (V, S); so (V, S') has at least as many components as (V, S). 

But this is a contradiction, since both (V, S) and (V, 5") are forests, and their 
numbers of components are n — (k — 1) and n — k respectively; it is false that 
n—-k>n-—(k-1). 
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In general, the greedy algorithm refers to any algorithm for constructing an 
object in stages, where at each stage we make the choice which locally optimises 
some ‘objective function’, subject to the condition that we move closer to our final 
goal. Obviously, this short-sighted local optimisation does not usually produce the 
best overall solution. It is quite remarkable that it does so in this case! (See 
Exercise 3.) 


11.4. Eulerian graphs 


One’s first encounter with graph theory often takes the form of the familiar puzzle 
‘trace this figure without taking your pencil off the paper’. Euler’s’ experience was 
similar. He showed that it was not possible to walk round the town of Konigsberg® 
crossing each of its seven bridges just once. This demonstration is commonly taken 
as the starting point of graph theory.” 

In problems of this sort, we are required to traverse every edge of a graph once, 
but we may revisit a vertex. So the appropriate type of route is a trail (see Section 
11.1). We define an Eulerian trail in a graph to be a trail which includes every edge. 
(A closed Eulerian trail is sometimes called an Eulerian circuit, but this conflicts 
with our definition of a circuit as a closed path.) Clearly an isolated vertex (lying 
on no edges) has no effect, and may be deleted. Also, it is convenient here to work 
in the more general class of multigraphs, where two vertices may be joined by more 
than one edge. Now Euler’s result can be stated thus: 


(11.4.1) Euler’s Theorem. (a) A multigraph with no isolated vertices has a closed 
Eulerian trail if and only if it is connected and every vertex has even valency. 

(b) A multigraph with no isolated vertices has a non-closed Eulerian trail if and 
only if it is connected and has exactly two vertices of odd valency. 


Proor. It’s obvious that a graph with an Eulerian trail must be connected if no 
vertex is isolated. The other conditions are also necessary. For consider a graph 
with a closed Eulerian trail. As we follow the circuit, each time we reach a vertex 
by an edge, we must leave it by a different edge, using up two of the edges through 
that vertex; since every edge is used, the valency must be even. The same applies at 
the initial vertex of a closed Eulerian trail, since the first and last edge of the circuit 
play the same role. For a non-closed Eulerian trail, however, the valencies of the 
first and last vertices are odd, since the first and last edges are ‘unpaired’. 


REMARK. According to the Handshaking Lemma (Chapter 2), the number of vertices 
of odd valency in a graph is even. So, if there is a vertex of odd valency, then there 
are at least two. 


T Euler could be claimed as the founder of combinatorics. He was not the first person to work 
on a combinatorial problem; but he is undoubtedly the mathematician of greatest stature who has 
made a serious contribution to the subject. We saw his encounter with orthogonal Latin squares in 
Chapter 8, and we will meet him again. 


8 Now Kaliningrad. 
? See, for example, Biggs, Lloyd and Wilson, Graph Theory 1736-1936 (1976). 
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Now we turn to the sufficiency of the conditions: we have to construct Eulerian 
trails in graphs satisfying them. The argument is, in some sense, algorithmic. 

So let G = (V, E) be a graph satisfying the condition of either (a) or (b). In 
case (a), let v be any vertex; in (b), let v be one of the vertices of odd valency. Now 
follow a trail from v, never re-using an edge, for as long as possible. Let S be the 
set of edges in this trail. 

For any vertex z other than v (in case (a)) or the other vertex of odd valency (in 
case (b)), whenever the trail reaches z, there are an odd number of edges through 
z not yet used. This is because we reached z along an edge, and previous visits 
accounted for an even number of edges (except for v, where previous visits accounted 
for an odd number of edges). Thus, we don’t get stuck at z; zero is not odd, so 
there is an edge by which we can leave. So the trail must end at v (in case (a)) or 
the other vertex of odd valency (in case (b}). 

If S = E, we have constructed an Eulerian trail, and we are finished. So suppose 
not. There must be a vertex u lying on both an edge in S and an edge not in $. For 
otherwise, the sets X and Y of vertices lying on edges in S, not in S respectively, 
form a partition of V; and no edge joins vertices in different parts, contradicting 
connectedness. l 

Moreover, in the graph (V, E \ S), every vertex has even valency. So, starting at 
u and using only edges of E \ S, we can find a closed trail, by the same argument 
as before. Now we can ‘splice in’ this trail to produce a longer one: start at v and 
follow the old trail to u; then traverse the new trail; then continue along the old 
trail. 

After a finite number of applications of this construction, we must arrive at an 
Eulerian trail of the type desired. 

Note that, in case (b), any Eulerian trail must start at one vertex of odd valency 
and finish at the other — a fact well known to anyone who has tried a ‘trace without 
lifting the pencil’ puzzle. 

The map of Königsberg is easily converted into a multigraph whose edges are 
the bridges, as shown in Fig. 11.1. All four vertices have odd valency; so there is no 
Eulerian trail. 


Fig. 11.1. The bridges of Konigsberg 
11.5. Hamiltonian graphs 


There is a natural analogue for vertices of an Eulerian trail: a Hamiltonian path? is 
a path which passes once through each vertex (except that it may be closed, that is, 


19 Hamilton's claim to give his name to this concept is much weaker than Euler's claim to Eulerian 
trails. Hamilton demonstrated that the graph formed by the twenty vertices and thirty edges of a 
dodecahedron possesses a Hamiltonian circuit, and patented a puzzle based on this; but he proved 
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its start and finish may be the same). A closed Hamiltonian path is a Hamiltonian 
circuit. A graph possessing such a circuit is called Hamiltonian. 

Clearly multiple edges are irrelevant here; so we may assume that our graphs 
are simple in this section. 

For n > 2, there is a unique graph on n vertices which is connected and divalent 
(regular with valency 2). This is the so-called n-cycle Cn. It can be represented as 
the vertices and edges of an n-gon. Now a graph is Hamiltonian if and only if it 
has a cycle as a spanning subgraph. 

Hamiltonian graphs are much harder to deal with than Eulerian ones. There is 
no simple necessary and sufficient condition for a graph to be Hamiltonian, and it 
is notoriously difficult to decide this question for a given graph of even moderate 
size. A lot of effort has gone into proving sufficient conditions. As an example, we 
prove one of the simplest of these conditions, Ore’s Theorem. 


(11.5.1) Ore’s Theorem. Let G be a graph with n vertices, and suppose that, for any 
two non-adjacent vertices x and y in G, the sum of their valencies is at least n. Then 
G is Hamiltonian. 


Proor. By contrast to Euler’s Theorem, this proof is non-constructive. PH remark 
the main points where the non-constructiveness appears. 

Arguing by contradiction, we suppose that G is a graph which satisfies the 
hypothesis of Ore’s Theorem but is not Hamiltonian. We also may suppose that G 
is maximal with these properties, so that the addition of any edge to G produces a 
Hamiltonian graph. (This curious feature of the proof is certainly non-constructive. 
We achieve it by adding new edges joining previously non-adjacent vertices as long 
as G remains non-Hamiltonian. Adding an edge does not decrease the valency of 
any vertex, and does not create any new non-adjacent pair of vertices, so the valency 
condition remains true. But we won't know when G is maximal unless we can test all 
the graphs obtained by adding an extra edge and show that they are Hamiltonian!) 

Now G is certainly not complete, so it has a non-adjacent pair of vertices z and 
y. Since G is maximal non-Hamiltonian, the graph obtained by adding the edge 
e = {z,y} is Hamiltonian; and a Hamiltonian circuit in this graph must contain e. 
So G itself contains a Hamiltonian path 


(z = 01,62, U2). 6-5 Un = y). 


(This step is also non-constructive.)} 
Now let A be the set of vertices adjacent to x; and let 


B = {v; : vi—ı is adjacent to y}. 


(Since y is not adjacent to v, = y, this set is well-defined.) By assumption, 
|A| + |B| > n. But the vertex v, = z doesn’t belong to either A or B; so 


no general result, and there is some evidence that he got the idea from Kirkman, who made the same 
observation at about the same time. Also, a problem involving a Hamiltonian circuit in a different 
graph, the ‘knight's tour’ on the chessboard, had been solved earlier by (of all people) Euler. 
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|AUB| < n—- 1. It follows that |AN B| > 1, and so there is a vertex v; lying in both 
A and B. 

Now we obtain a contradiction by constructing a Hamiltonian circuit in G. 
Starting at z = vı, we follow the path v2, v3,... as far as v;-1. Now v;_; is adjacent 
to y (since v; € B), so we go to y = vy at the next step. Then we follow the 
path backwards through v,_1,... as far as v;, and then home to v, (this edge exists 
because v; € A). 


The result is best possible in some sense. Consider the graph with 2m + 1 
vertices £1,...,2m,Yis--->Ym41, and having as edges all pairs {z,,y;}. (This is a 
complete bipartite graph.) It is not Hamiltonian; for any edge crosses between the 
sets A = {a1,...,@,,} and B = {b,..., bm}, and so a path of odd length starting 
in A must finish in B and cannot be closed. But two non-adjacent vertices are both 
in A or both in B, and the sum of their valencies is 2m4+2=n+1 or 2m—n—1 
respectively. 

Nevertheless, there are a great many results which strengthen Ore’s Theorem by 
varying the hypotheses slightly. 


11.6. Project: Gray codes 


An analog-to-digital converter is a device that takes a continuously-varying real number and 
converts it to an integer, ideally the integer part or the nearest integer. The result is presented in the 
standard way, usually to base 10 (in an odometer or gas meter) or base 2 (in an electronic device 
connected directly to a computer). 

We considered the operation of an odometer in Chapters 2 and 4. There are points in its 
operation where several digits must change simultaneously. Owing to mechanical limitations, the 
change is not quite simultaneous. Thus, a reading taken at this point may involve a considerable 
error. For example, in the course of changing from 36999 to 37000, the reading could be as low as 
36000 or as high as 37999; and even if we assume that the digits change sequentially from the right, 
the low value 36000 is still a possible reading. 

To eliminate this error, we need to arrange the numbers in order (different from the usual order) 
so that only one digit changes at each step. If this can be done, the only possible error will arise 
from a time delay in the mechanical operation of the device, and will be at most 1. In the case of 
binary representation, such a sequence is known as a Gray code. It has a natural graph-theoretic 
interpretation. 

The n-cube Qn is the graph whose vertices are all n-tuples z,-1...29 of zeros and ones, two 
vertices being adjacent if they agree in all but one position. (Note that there are 2” vertices, which 
we write as the binary representations of the integers from 0 to 2" — 1. The n-cube consists of the 
vertices and edges of the familiar regular polytope of the same name in R”.) Now a Gray code is 
the same thing as a Hamiltonian path in the n-cube. For n = 1, the graph Q; is a single edge, and 
trivially has a Hamiltonian path. But for n > 2, we can do better: 


(11.6.1) Theorem. For n > 2, the graph Q,, is Hamiltonian. 


The proof is by induction. For n = 2, Q» is the 4-cycle C4, and the assertion is true: we fix the 
Hamiltonian circuit (00,01, 11, 10,00). Suppose that Q, has a Hamiltonian circuit (vo,...,v2.-1). 
Then 


(00,001, ..., Ovan—2, Ovze—1, lvgma1, Lvz»—2,..., 1V1, lvo, Ovo) 


is a Hamiltonian circuit in Q,41. 
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The case n = 3 is shown in Fig. 11.2. 


000 
Fig. 11.2. Hamiltonian cireuit in the 3-cube 


it is necessary to be able to ‘encode’ and ‘decode’ this code. That is, we have 
Nt® term of the Gray code, and the position of (the binary representation 
of) N in the code. (Note that these statements make sense independen of Ae a ir m as long = 
2" > N; for the Gray code of length an egins wi e GI cod OF ene tion ia that, if at 
preceded by 0, which doesn’t change the integers they represent.) e key peervation is te digit 
i ividing N, then the digit which changes at the N°" si ep Í 

is te exact pora oi asiy proved by induction: in our construction, the n't digit changes only 
at the (2")* step.) At the same step in the odometer, the 0", Ley k digits all change, om 
observation, it is not difficult to prove the following assertions; the reader is encourag: 


proofs: 


For practical use, 
to be able to calculate the 


be the binary representation of N. For i =o A n=- My 
i = 0 if t; = Ti ; = 1 otherwise. (DY 
= z; + 2:4, (mod 2); that is, y: = 0 if z; = Tithe Yi 
tonvention, are = 0.) Let Yn-1-+-Yo be the binary representation of M. Then 
the number in the N*® position in the Gray code is M. 


Let £n-1- -To 


j i i= — 1, let z; be 

be the binary representation of M. Fori=0,....n-1, i 
ofon . +) Yn-1}, taken mod 2, and let tn-1-- -2o be 
he number M occurs in the N‘® position 


Let yn-1.-- 
the number of ones in the set {y;, - 
the binary representation of N. Then i 


in the Gray code. 


11.7. The Travelling Salesman 


A salesman for the Acme Widget Corporation! has to visit all n cities ina country 

on business. The distance between each pair of cities is known. She wants to 
inimi i to her starting point. 

minimise the total distance travelled, and return 

This is the notorious Travelling Salesman Problem (TSP). In graph-theoretic 
terminology, it asks for the Hamiltonian circuit of smallest weight in an edge- 
weighted complete graph. 

° in fact, there is no real loss in restricting to the complete graph. For a general 
edge-weighted graph, simply add new edges with ridiculously large weights, so tha 
. ae wei cuit. 
these cannot occur in any minimum weight circui i , 

Indeed, the Hamiltonian circuit problem for a given graph G is a special case of 
the Travelling Salesman problem. If G has n vertices, assign the weight 1 to an edge 
of the complete graph Kn if it is an edge of G, and 2 otherwise. Then the minimum 
weight of a Hamiltonian circuit of Kn is n if and only if G has a Hamiltonian 


circuit. 
o ooo o a 


11 Widgets are generic industrial products in Operational Research problems 
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The existence of an algorithm to solve this problem is not in doubt. 


(11.7.1) (Slow) Algorithm for Travelling Salesman 
o Generate all permutations of {1,...,n} (see Chapter 3). 
e For each permutation 7, calculate 
r—l 
Y w({ix, (i + 1yr}) + w({nz, I7}). 


i=t 


ə Return the smallest value. 


The disadvantage is that there are n! permutations; for even moderate values of 
n (say, n = 50), this number is so large that the method cannot be contemplated. 

; Some small improvements can be made. For example, we can assume that the 
circuit starts at vertex 1, so that lx = 1; this saves a factor of n. Unfortunate] 
nobody knows how to do substantially better! ad 

Because of the practical importance of the problem (not just for sales depart- 
ments, but for other applications such as design of circuits), some compromises have 
been reached. Methods which deliver an approximate solution have been developed. 


Out of a huge literature, I have selected one example, chosen because it uses concepts 


(11.7.2) Twice-round-the-Tree Algorithm 
(An approximate solution to the Travelling Salesman) 
e Find a minimal connector S. 
e In the multigraph obtained by duplicati 
iplicating each ed; 

a closed Eulerian trail. 5 odee of $, fnd 
e Follow this trail, but whenever the next step would involve 

revisiting a vertex, go instead to the first unvisited vertex on the 

trail. When every vertex has been visited, return to the start. 


i n the second step, every edge in S is duplicated, resulting in a connected graph 
with all valencies even; so there does indeed exist a closed Eulerian trail, and we 
have seen an algorithm for finding one. 


It is clear that this algorithm produces a Hamiltonian circuit. How good is it? 


We say that an edge-weigh i $ 
ghted complete graph satisfies the t : ity i 
for any three vertices a,b,c, we have rangle inequality if 


w({a,b}) + w({d,c}) > w{{a,c}). 
This condition certainly holds if the weights are distances between towns.'? 


Or much more genera. disi ances. nder minima assumptions the shortest route from a to c 
l t U l i 
cannot be longer than a route via ò. 
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(11.7.3) Theorem. Let G be an edge-weighted complete graph, m the weight of a 
minimal connector, M the smallest weight of a Hamiltonian circuit. Then 

(a) mM; 

(b) if G satisfies the triangle inequality, then M < 2m. 


Proor. (a) is clear, since a Hamiltonian circuit is certainly connected. (Indeed, it 
remains connected when any edge is deleted, so its weight is at least the sum of m 
and the smallest edge weight in G. This can be further improved.) 

For (b), note that the weight of the closed Eulerian trail in the second stage of 
the algorithm is equal to 2m. Now, in the third stage, we take various short cuts, 
replacing a path v;,..., 2; by a single edge from v; to vj- By the triangle inequality 
(and induction), the length of the edge doesn’t exceed the length of the path. So 
we end up with a Hamiltonian circuit of weight at most 2m, giving a constructive 
proof of the inequality. 


Another celebrated problem bears the same relation to closed Eulerian trails as 
the Travelling Salesman does to Hamiltonian circuits. This is the Chinese postman 
problem: Given an edge-weighted connected graph, find the closed walk of minimum 
weight which uses every edge of the graph. (The postman must pass along every 
street delivering letters.) If the graph G is Eulerian, then a closed Eulerian trail is 
the solution. If not, then some edges must be traversed more than once. There is an 
efficient algorithm for this problem. 


11.8. Digraphs 


The most important variant of graphs consists of directed graphs or digraphs, where 
the edges are ordered pairs of vertices (rather than unordered pairs). Each edge 
(x,y) has an initial vertex x and a terminal vertex y. Note that (z, y) and (y,#) are 
different edges. 

With any digraph D is associated an ordinary (undirected) graph, the underlying 
graph: it has the same vertex set as D, and its edges are those of D without the 
order (that is, {x,y} for each edge (x,y) of D). The underlying graph will fail to be 
simple if D contains two oppositely-directed edges (such as (x,y) and (y, z)). If the 
underlying graph is simple, then D is called an oriented graph. 


The definitions of the various types of route in a digraph are the same as in a 
graph, with the important exception that the edges must be traversed in the correct 
direction: so, if 

(Vos E1, V13 En Un) 


is a trek, trail, or path, then e; is the edge (v,;-1,v:) for? = 1,...,7. In a digraph, 
we cannot immediately retrace an edge, and so every walk is a trek. (11.1.1) holds 
without modification for digraphs. 

The situation with connected components is different, however. If, as before, 
we let R be the relation defined by the rule that (z,y) € R if there is a path (or 
trail, or trek) from x to y, then the relation R is reflexive and transitive, but not 
necessarily symmetric; so it is a partial preorder but not necessarily an equivalence 
relation. (See Sections 3.8-9 and Exercise 18 of Chapter 3 for partial preorders and 
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their connection with equivalence relations.) Accordingly, we define two types of 
connectedness: 

e the digraph D is (weakly) connected if its underlying graph is connected; 

è Dis strongly connected if, for any two vertices z, y, there is a path from z to y. 
It’s clear that a strongly connected digraph is weakly connected. The converse is 
false. 

The definitions of Eulerian trail and circuit and of Hamiltonian path and circuit 
are just what you expect. A digraph possessing a Hamiltonian circuit is obviously 
strongly connected; one with a Hamiltonian path is weakly (but not necessarily 
strongly) connected. Similar statements hold for Eulerian trails and circuits. 

The analogue of Euler's theorem runs as follows: 


(11.8.1) Buler’s Theorem for digraphs. A digraph with no isolated vertices has a 
closed Eulerian trail if and only if it is weakly connected and the in-valency and 
out-valency of any vertex are equal. 


You are invited to prove this, and to formulate and prove a necessary and 
sufficient condition for the existence of a non-closed Eulerian trail. 


11.9. Networks 


A network is an edge-weighted digraph possessing two distinguished vertices, the 
source s and the target t, with s £ t. The weight of an edge e is referred to as its 
capacity, and denoted by c(e). 

A good model to keep in mind is a hydraulic network consisting of pipes and 
junctions. Fluid is pumped in at the source and out at the target; the capacity of 
an edge reflects the maximum rate of flow possible in that pipe. Of course, much 
wider interpretations are also possible, such as the movement of commercial product 
through distribution systems between factories, warehouses, etc. 


In accordance with this interpretation, we define a flow in a network to be a 
function f from the edge set to the non-negative real numbers, satisfying the two 
properties 

e O< f(e) < e(e) for all edges e; 

© Luce fe) = Erez J(e) for all vertices v £ s,1. 
Here c is the capacity; s and t the source and target; and, for any edge e, «(e) and 
r(e) denote the initial and terminal vertices of e. Thus, the first condition asserts 
that the flow in any edge is non-negative and doesn’t exceed the capacity of the 
edge; the second asserts that, for any vertex v other than the source and target, the 
flow out of v is equal to the flow into v, so there is no net accumulation at any 
point. 

The value of a flow f is defined to be 


val(f)= Ð fe- Yo fle); 


ele)=s t(ej=s 


13 Tt is possible to imagine a town with one-way streets in which you can drive from x to y but not 
from y to z (but very impracticable!) 
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to be equal to the net flow into the 


t of the source. It turns out 
the ae ne next. we use the notation 


target, as the next result shows. For any set S of vertices, 


S7 = fe: t(e) € S r(e) E S), 

5- = {e : (e) £ S, r(e) € S} 
(11.9.1) Proposition. Let f be a flow in a network, S a set of vertices containing the 
source but not the target. Then 


Z Hd- L Fe) 


eEsS7 egs” 


ProoF. To show this, we calculate 


g| K9- ZL so): 


veS \e(e)=v r{e)=v 


On one hand, this is equal to val( f), since the term of the outer sum with v = s 
is equal to val(f), while the other terms are all zero by definition of a flow. 

On the other hand, consider this as a sum over edges. Let e = (vw) be an 
edge. If v € S, then f(e) occurs in the term of the outer sum corresponding to v; 1 


in the term corresponding to w. Thus, only those edges 


w € S, then — f(e) occurs i 1 ; 
with exactly one end in S, viz., those in S~ and S“, contribute to the sum, and 


their contributions are f(e) and —f(e) respectively. 
Now take S = V \ {t}, where V is the vertex set; then S7 = {e : r(e) 
S+ = {e : u(e) = t}, and so the net flow into t is equal to val( f). 


= t} and 


The main question about a network is: 
What is the maximum value of a flow in the network? 


A cutin a network is a set C of edges with the property that any path from the 
C. Its capacity cap(C) is the sum of the 


source to the target contains an edge in i 
capacities of its edges. Intuitively, it is clear that the capacity of a cut is an upper 


bound for the value of any flow. We will show this and more: 


(11.9.2) Max-Flow Min-Cut Theorem B 
The maximum value of a flow in a network is equal to the minimum 


capacity of a cut. 


including Hall’s Mar- 


is i has a number of consequences, 
This important theorem ha Sn et of the 


i hs. 
tiage Theorem and Menger’s Theorem on paths in grap c 
Max-Flow Min-Cut Theorem is in part algorithmic. More precisely, the proof is 


algorithmic in the case when all the capacities are integers, and we prove something 


more: 
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(11.9.3) Integrity Theorem 
Suppose that the capacity of every edge in a network is an integer. 
Then there is a flow of maximum value, such that the flow in every 
edge is an integer. 


The integral case is the important one; we'll see that the general case can be 
deduced from it by quite different (and non-constructive) methods. 


Our first task is to prove: 


( the value ) < the spaci) 
of any flow} 7 ( of any cut / © 

It is enough to prove this for minimal cuts (those for which, if any edge is 
removed, the result is not a cut). So let C be a minimal cut. Define $ to be the set 
of vertices v for which there exists a path from s to v using no edge of C. Then 
C G 57. (If e is any edge in C then, by minimality, there is a path from s to ¢ using 
the edge ¢ and no other edge of C; so t(e) € S and r(e) ¢ S.) Now, if f is any flow, 


then 
val(f) = J fle)— > fle) 


ees ees 


< Do ele) 


ees 
= cap(C). 
Now we treat the case where all capacities are integers. We prove the following: 


If all capacities of edges in a network are integers, then there is a 
flow f, with integer values on all edges, and a cut C, such that 


val(f) = cap(C). 


By what we just proved, f is a maximal flow and C a minimal cut; so the Max- 
Flow Min-Cut Theorem (in this case) and the Integrity Theorem will be proved. 


The proof involves showing the following. 
Let all capacities of edges in a network be integers, and let f be an 
integer-valued flow. Then either 


e there is an integer-valued flow f' with val f') = val(f) +1; or 
e there is a cut C with cap(C) = val f). 


Now we can start with any flow, and apply this result successively. As long as 
the first alternative holds, the value of the flow is increased. So eventually the second 
alternative becomes true, and we have finished. In order to prove the theorem, we 
can start with the zero flow (the zero function is always a flow!); but in practice it 
is usually possible to spot a starting flow which is close to maximal, and shorten the 
calculation. The proof of the assertion is algorithmic. 
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So let f be an integer-valued flow in a network with integer capacities. Perform 
the following algorithm. 

Set S = {s}. a 
WHILE there is an edge e = (v, w) with either 

e fle) <e(e), vE S, wgs, or 

e fle) >0,v¢gS,weES, i 
add to S the vertex of e it doesn’t already contain (w or v respec- 
tively). 
RETURN S. 


Now there are two cases, according as ¢ € S or not, 


Case 1. t € S. By construction of S, it follows that there is a path from s tot in the 

underlying graph, say (vo = §,01,---,0¢ = t), such that, for each z, either 

(a) (vizi, vi) is an edge e of the network with fle} < efe); or 

(b) (vi, vi-1) is an edge e of the network with f (e) > 0. F 
Let A and B be the sets of edges of the digraph appearing under cases (a) an 

(b) respectively. Now define a new flow f' by the rule 


f(e)+1 ifecA; 
r= {he -1 ifeeB; 

file otherwise. 
We have to show that this is a flow, and that its value is one more than that of f. 
The first axiom for a flow, that 0 < f’(e) < c(e) for all e, holds because all capacities 
and flow values are integers, so (for example) if f(e) < c(e), then f(e) +1 < c(e). 
The second axiom requires some case checking. Let v; be a vertex on the path (no 
vertex off the path is affected); suppose that i # 0,d. If (v;-1,;) and (vi, vit) are 
both edges, then the net flow into v; and the net flow out of v; are both increased 
by 1, and the flows still balance. The other cases are similar. Also similar is the fac 
that the value of the flow is increased by 1. 


Case 2. t ¢ S. Then S” is a cut. Also, by definition of S, if e € S”, then f(e) = efe), 
and if f € S“, then f(e) = 0 (else the algorithm would enlarge 5S). So 


val(f)= > fld- ¥ Fle) 


e€S— ees 
D ele) 
e€S— 
= cap(S~), 


as required. 

This completes the proof in the integral case. 

The rest of the proof of the Max-Flow Min-Cut Theorem is quite different (and 
of less interest). It parallels the construction of the real numbers from the integers: 
first we construct the rational numbers by division, and then we construct the reals 
by an analytic process (typically Cauchy sequences or Dedekind cuts}. ; 

So suppose first that all capacities are rational. By multiplying by the highes 
common factor m of the denominators of these rationals, we obtain a new network 
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in which all capacities are integers. By the previous case, the Max-Flow Min-Cut 
Theorem holds for the new network; hence it also holds for the old one (on dividing 
the flow values by the same number m). 

Finally, suppose that the capacities are real numbers. We can approximate them 
arbitrarily closely from below by rational numbers, and hence find flows whose 
values are arbitrarily close to the capacity of a minimal cut. Then the result follows 
by a limiting process." 

This concludes the proof of the Max-Flow Min-Cut Theorem. 


11.10. Menger, Konig and Hall 


The Max-Flow Min-Cut Theorem, in combination with the Integrity Theorem, is 
a very powerful tool in graph theory. The key to its application is to consider an 
arbitrary directed graph with distinguished vertices s and ¢ as a network in which 
each edge has capacity 1. Now, in an integer-valued flow in this network, the flow in 
any edge must be 0 or 1; so the flow ‘picks out’ a subset of the edges, those carrying 
a flow of 1. Now, if the value of the flow is m, then there are m edge-disjoint paths 
from s to t. (This is proved by induction on m. Starting from s and using only 
edges with positive flow, never re-using an edge, we eventually arrive at t, having 
constructed a trek from s to t. Deleting circuits between repeated vertices, we obtain 
a path from s to t. Now, if we reduce the flow in the edges of this path to 0, the 
value of the flow is decreased by 1. By induction, we can find m — 1 edge-disjoint 
paths among the remaining edges. So the claim is proved.) 

This conclusion can be put in the following form, where an st-separating set of 
edges is a set C such that every path from s to t uses an edge of C: 


(11.10.1) Menger’s Theorem. Let s and t be vertices of a digraph D. Then the 
maximum number of pairwise edge-disjoint paths from s to t is equal to the 
minimum number of edges in an st-separating set. 


Menger’s Theorem also has a version for undirected graphs, and versions which 
refer to vertices instead of edges. You can read about these in Beineke and Wilson, 
Selected Topics in Graph Theory. 


Further results involve more specific digraphs. A very important class of digraphs 
are those derived from bipartite graphs. 

A graph G = (V, E) is bipartite if there is a partition of the vertex set into two 
parts A and B such that every edge has one end in A and the other end in B. The 
partition {A,B} is called a bipartitioa of G. 


‘4 There is an additional subtlety here. We construct a sequence of flows whose values converge to 
cap(C), where C is a minimal cut. Now the flows can be regarded as points in a Euclidean space 
whose dimension is equal to the number of edges. Moreover, they lie in closed and bounded region 
of the space. Such a region is compact; so, by the Bolzano—Weierstrass Theorem, the sequence of 
flows has a convergent subsequence. The limit of this subsequence is a flow whose value is equal to 
cap(C). See Chapter 10, Exercise 5, for the 1-dimensional Bolzano—Weierstrass Theorem; the general 
case is proved coordinatewise. 
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Given a bipartite graph G with bipartition {A,B}, we construct a network as 
follows. The vertex set is {s,¢} U AU B, where the source s and target t are new 
vertices. The edges are 

e all pairs (s,a) for a € A; 

e all pairs (5,2) for b € B; 

e all pairs (a, 6), with a € A, b € B, for which {a,b} is an edge of G. 
I will call this network N(G). 

In order to interpret flows and cuts in N(G), we need two definitions. In a graph 
G, a matching is a set of pairwise disjoint edges; and an edge-cover is a set S of 
vertices with the property that every edge contains a vertex of S. 

Now a path from s to ¢ in N(G) has the form (s,a,6,t), where a € A, b € B, 
and {a,b} is an edge of G. So a set of edge-disjoint paths (s, a;,6;,t) in N(G) arises 
from a matching in G consisting of the edges {a;, bi}, i = 1,...,m. 

An edge-cover 5 in G gives rise to a cut in N(G), consisting of the edges (s, a) 
for a € SN A, together with the edges (b,t) for b € SN B. (Any path from s 
to t must use an edge of G, and hence pass through a vertex of S, since S is an 
edge-cover.) Now there may be other cuts, containing some edges of the form (a, b); 
but none of these can be smaller than all those of the first type. For let S be an 
arbitrary cut. Replace every edge (a,b) in S (a € A, b € B) with the edge (s, a), 
deleting repetitions; the result is a cut containing edges of the form (s,a) and (5,¢) 
only. We conclude that 


The size of the smallest cut in N(G) is equal to the size of the 
smallest edge-cover in G. 


Hence we conclude: 


(11.10.2) König’s Theorem. The maximum size of a matching in the bipartite graph 
G is equal to the minimum size of an edge-cover in G. 


Finally, we will show that Hall’s Marriage Theorem is a consequence of König’s 


Theorem.'® 


In order to do this, we have to translate a family of sets into a bipartite graph. 
This is a common and important procedure. 


Let F = (Aj,...,An) be a family of subsets of {1,...,7}. We define the 
incidence graph G of F as follows. The vertex set V of G is the union of two parts 
A = {l,...,m} and B = {Aj,...,An}; and the vertices i € A and A; € B are 
joined if and only if i € A;. 

The incidence graph is clearly bipartite; the sets A and B used in its definition 
form a bipartition. If the dual rôle played by the vertices (which are also sets 
or elements of sets) is confusing, you may take A to be a set in one-to-one 
correspondence with {1,...,m}, and B a set in one-to-one correspondence with F. 

Now a matching in the incidence graph G is a set of disjoint edges {z, Aj}; thus, 
each point i lies in its corresponding set Aj, and the points are all distinct, as are 


15 In fact, Konig’s Theorem was proved before Hall's, but this implication was not noticed until 
afterwards, (Hall was a group theorist, Konig a graph theorist.) 
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the sets. on is just a system of distinct representatives for a subfamily of F. So F 
as an if and only if there is a matching whose edges contain all vertices of B. 


Recall that, for J C {1,...,n}, we set A(J) = U; i 
Hall’s Condition if |A(J)| > |J| He Ic ft ? o pe As We say that Z satisfies 


(11.10.3) Hall’s Marriage Theorem. The fami 
0.3. . amily (A,,... i 
only if it satisfies Hall’s Condition. y A »An) possesses a SDR if and 


PROOF. As in Chapter 8, the necessity of the condition is clear: if a SDR exists, th 
A(J ) must contain representatives of all the sets A; for j € J, and so mu th ve 
size at least as great as J. So suppose that Hall’s Condition is satisfied Let G be the 
incidence graph of the family. We have to show that there is a matching of size n in 
G. By Konig’s Theorem, we must show that any edge-cover in G has size at least a 
The set of all vertices in B is an edge-cover of G of size n. Let S be. y 
edge-cover, and let J = B \ S. Each vertex in A(J) is joined to a vertex fJ b an 
edge; so the edge-cover S must contain A(J). Thus ° ™ 


|S| > |B|- F| + |A(Z)| 2 n, 
by Hall’s Condition. 


REMARK. We have, in some sense, gi i 
e sense, given a constructive proof of Hall’s Theorem. 
Given a fen F ENG) sntisiying Hal’ Condition, construct its incidence graph 
, WOT . Use the algorithm of the last section to fi j 
flow in N(G). Then th i see to flow define the 
na (G ) n the edges from A from B carrying non-zero flow define the 
The network algorithm can be translated into more graph-theoretic language 


for this purpose, A formulati i oni 
for this pur ulation of the algorithm for König’s Theorem is given in 
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me know what it means for a graph to be connected. How do we decide in practice? 
ere is an algorithm which computes the connected component of a graph G 
containing a vertex x (and more besides, as we will see). “ 


(11.11.1) Algorithm: Co ining 
; : Component conta 7 
Mark z with the integer 0. Set d = 0. 7 
War any vertex was marked at the preceding stage, 
ə look at all vertices marked d; mark all unmarked nei 
of such vertices with d+ 1; od neighbours 
ə replace d by d +1. 


At the termination of this algorithm, the marked vertices comprise the connected 


component containing z, and the mark of each i 
SE a on hat non A 3 ach vertex is the length of the shortest 
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In a connected graph G, the distance d(x,y) from x to y is the length of the 
shortest path from z to y. (Sometimes, distance is defined in a general graph, so 
that the distance between two vertices in different components is oo; we ignore 
this complication.) The algorithm above gives a method for computing the distance 
between two vertices of a graph. 


The distance function satisfies 

e d(x,y) 2 0, and d(z,y) = 0 if and only ifr=y; 

e d(z,y) = d(y, 2); 

e d(a,y) + d(y,2) 2 d(x,z}. 
The first two properties are clear. For the third, note that there is a walk of length 
d(x,y) + d(y, 2) from z to z via y; this can be converted into a path by removing 
repetitions in the usual way, so the length of the shortest path cannot be greater 
than this. 

The third condition is the triangle inequality, which we met already in Sec- 
tion 11.7. If you have studied introductory topology, you will recognise the three 
properties as the axioms for a metric. So, in this language, a connected graph, 
equipped with its distance function, is a metric space. 


The diameter of a connected graph G is the maximum value attained by the 


distance function. 
The number of vertices of a graph is bounded in terms of the diameter and the 


maximum valency of a vertex: 


(11.11.2) Theorem. In a connected graph with diameter d and maximum valency k, 
the number of vertices is at most 


— d 
ak pan DEREI H telk- 1E e =. 


Proor. We show by induction that there are at most k(k — 1)'~! vertices at distance 
i from a given vertex T, for i > 1. This is clear for : = 1. For the inductive step, we 
double-count pairs (y, z7}, where y and z are adjacent and lie at distances iandi+1 
from z respectively. There are at most k(k — 1)'"? choices for y; each is joined to 
one point at distance t — 1 from z (lying on a shortest path from z to y), and so for 
given y there are at most k —1 choices for z. On the other hand, for each z, there is 
at least one y (again on a shortest path to z); so there are at most k(k —1)* such z. 
Now the result is obtained by summation. 


In the next section, we examine graphs meeting this bound. First, however, we 
prove a ‘dual’ result. 


The girth of a graph G is the length of the shortest closed path in G. Thus, forests 
don’t have a girth (or we could say the girth of a forest is infinite). Alternatively, the 
girth is the smallest n 2 3 for which the graph contains the n-cycle Ca as an induced 
subgraph. (A closed path of length n is a subgraph isomorphic to Cn; if it is not 
an induced subgraph, then there must be an edge of G joining two non-consecutive 
vertices, in which case the circuit is cut into two shorter circuits.) 
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(11.11.38) Theorem. Let G be a graph of girth g, and let e = |(g — 1)/2|. Suppose 
that the minimum valency of G is k. Then G has at least 


1+k+k(k—1)+...+k(k -17 = 
vertices. 


Proor. The argument is similar to the previous theorem: we show that, for 1 < i < e, 
the number of vertices at distance i from z is at least k(k—1)'"". Again, the induction 
begins trivially. Now consider the double count, 

For each y with d(z,y) = i < e, there is one neighbour of y at distance i — 1 
from z, and none at distance i from x. (Otherwise, we could start from z, trek to y, 
and return a different way, to create a closed trek of length 2i or 2i + 1; so there 
would be a closed path of length at most 2i + 1. Since 2i +1 < g, this is impossible.) 
Thus, at least k — 1 neighours of y lie at distance i + 1 from y. 

In the same way, given z with d(z,z) =i+1, there can be only one neighbour y 
of z at distance į from x (since 2(¢ + 1) < 2e < g by assumption). So the induction 
goes through. 


Close inspection of the argument shows the following: 


Theorem. Of the following conditions on a graph G, any two imply the third: 
e G is connected with maximum valency k and diameter d; 
e G has minimum valency k and girth 2d +1; 
o G has 1 + k((k — 1)? — 1)/(k — 2) vertices. 


A graph satisfying these three conditions is called a Moore graph of diameter d 
and valency &. (The first two conditions show that a Moore graph is regular.) In the 
next section, we examine Moore graphs of diameter 2. 

Tt turns out that Moore graphs are very rare. So the next question is: how close 
to these bounds can we get (for general values of k and d, or asymptotically)? A lot 
of work has been done on this question, but the results will not be described here. 
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In this section, we decide (almost completely) for which values of k there exists a 
Moore graph of diameter 2 and valency k. 


Let G be a graph with vertex set {1,2,...,n}. The adjacency matric A(G) of G is the n x n 
matrix whose (t, j) entry is equal to 1 if {i,j} is an edge of G, 0 otherwise. It is a real symmetric 
matrix, and thus it can be diagonalised. The argument involves calculating the eigenvalues of A(G) 
and their multiplicities. 

Let G be a Moore graph of valency & and diameter 2. From the argument in the last section, 
we see that G has 

n=14k+k(k-1L =k? 41 


vertices, and that G has girth 5. Thia means that, if z and y are adjacent, then no vertex is adjacent 
to both; and, if z and y are non-adjacent, they have exactly one common neighbour. 
Let A be the adjacency matrix of G, and J the n x n matrix with every entry 1. If J is then xn 
identity matrix, we claim that 
A? =kI4(F —1- A). 
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To see this, we prove: 


For any graph G, the (i,j) entry of A(G)? is equal to the number of common 
neighbours of i and j. 


For the (i, j) entry is 


YAA); 
hal 


and every entry of A is zero or 1, 30 the sum counts the number of vertices h for which (Aja = 
(A)a; = 1, that is, the number of vertices A joined to both i and j. This proves the asser ion. kit 

Now, in the case of our Moore graph G, the number of common neighbours of i aad j i ri 
i = j (since the valency is &), and is 0 if {i,j} is an edge and 1 otherwise. This means hat AS ias 
diagonal entries k, and off diagonal entries 0 or 1 according as A has entries 1 or 0 (in other . 
off the diagonal, it coincides with J — I — A). this proves the claim. 

Now we examine the spectrum of A. Let j be the vector with all its entries L. Then the D entry 
of Aj is just the row sum of the it row of A, which is equal to k since G is regular with valency k. 


Thus, Aj = kj, and j is an eigenvector of A with eigenvalue . ; 7 
" sa A is symmetric, the subspace W of R” consisting of vectors perpendicular to j is preserved 
by A. Alao, for any w € W, the sum of the entries of w is 0, and so Jw = 0. Thus, for w € W, we 
have 
A’w = kw +(-I- A)w, 
whence 
(A? + A= (k-1)2)w = 0. 
Let œ be any eigenvalue of A (acting on wW). If w is the corresponding eigenvector, then the 
above equation shows that 
“ a ta-(k-1)=0. 


So & is a root of this quadratic equation, whence 
a =z (-1 vie=3). 


Now we distinguish two cases. 
Case 1. 4k — 3 is not a perfect square. Then the eigenvalues are irrational. So the multiplicity of the 
two roots of the quadratic, as eigenvalues of A, are equal, and so each is (n — 1)/2 =k [2N ow we 
use the fact that the sum of the eigenvalues of a matrix is equal to ita trace (the sum of the nie 
elements). A has the eigenvalue k with multiplicity 1, and (—1 4 v4k — 3)/2 each with multipheity 
(n — 1)/2; and its trace is zero, since all its diagonal elements are zero. Thus, we have 


nO (CD «(8 C= 


2 2 2 
from which we find that k = k?/2, or k = 2. ; 
Now there is a unique graph of valency 2, diameter 2, and girth 5: the 5-cycle or pentagon. 


Case 2. 4k — 3 is a square. Since it is odd, so is its square root; say 4k — 3 = (28 + 1)? for some 
integer s, from which we find that k = 6? +s +1. The eigenvalues of A are k (with mu tiplicity 
1}, s, and —s — 1. The multiplicities of the last two eigenvalues are, say, f and g; we know tha 
f +g=n— 1 = k?. Since the trace of A is equal to 0, we also find that 


k+fe+tg{—s—1)=0. 
From these two linear equations, it is possible to solve for f and g. We find that 


_ os? ++ 1)(s? + 2s + 2) 
T 2s +1 
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Now the multiplicity of an eigenvalue of a matrix must be an integer; so we conclude that 
28 + 1 divides o(s? + s + 1)(s? + 28 + 2). 
Multiplying this expression by 32 and doing some manipulation, we find that 2s + 1 divides 
(28 + 1) — 1)((25 + 1)? + 8)((28 + 1)(28 + 3) + 5). 


From this, it is clear that 2s + 1 divides 15, so that 2s + 1 = 1,3,5 or 15. This gives the possible 
values 

s=0,1,3 0r7; 

k= 1,3,7 or 57; 

n = 2, 10,50 or 3250. 

The case n = 2 is spurious, since G would have a single edge and would not have diameter 2 
So we conclude: 


(11,12.1) Theorem. If there is a Moore graph of diameter 2 and valency k, then k = 2,3,7 or 57, and 
the number of vertices is 5,10, 50 or 3250. ™ 


; For k = 2, we saw that the pentagon is the only graph. In a moment, we will construct the 
unique Moore graph of valency 3. There is also a unique Moore graph of valency 7, though this is 
harder to consruct. Nobody knows whether one of valency 57 exists or not! 


THE PETERSEN GRAPH. 


Let G be a Moore graph of valency 3 and diameter 2, with 10 vertices, Let {a,b} be an edge of G. 
Then each of a and b has two further neighbours, with no vertex joined to both. Let b,c, d be the 
neighbours of a, and a,e, f the neighbours of b. There are no edges within the set {c,d e f}, for 
any such edge would create a circuit of length 3 or 4. ue 

; Now c and e have a unique common neighbour, since they are not adjacent; let g be this 
neighbour. Similarly, let h be the common neighbour of c and f; i that of d and e; and j that of d 
and f. These vertices are all distinct and are joined to none of a,..., f except where specified. Now 
we have all vertices. The first six have three neighbours each, and the last four have two each (so 
far); so we need two more edges to complete the graph, with each of g, h,i, j on one edge. But g is 
not joined to À or 7; so we have edges {g, j} and {h, i}. 


Fig. 11.3. Uniqueness of a Moore graph 


This completes the unique Moore graph of diameter 2 and valency 3 (see Fig. 11.3). It can be 
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drawn in other, more symmetrical ways (as in Fig. 11.4, for example). 


Fig. 11.4. The Petersen graph 


This graph is the notorious Petersen graph. Its fame stems from the fact that it is a counterex- 
ample to a large number of conjectures in graph theory. If you discover an assertion you believe to 
be true of all graphs, test it first on the Petersen graph! It is now the star of a book in its own right: 
Holton and Sheehan, The Petersen Graph (1993). 


To complete the story of Moore graphs, here are the facts. As noted above, there is a unique 
Moore graph of diameter 2 and valency 7, the Hoffman-Singleton graph; the remaining case for 
diameter 2 is unknown. For larger diameter, Damerell, and Bannai and Ito, independently showed 


the following result. 


(11.12.2) Theorem. For d > 3, the only Moore graph of diameter d is the (2d + 1)-cyele Cra41 (with 
valency 2). 


11.13. Exercises 


1. There are 34 non-isomorphic graphs on 5 vertices (compare Exercise 6 of 
Chapter 2). How many of these are (a) connected, (b) forests, (c) trees, (d) Eulerian, 
(e) Hamiltonian, (f) bipartite? 

2. Show that the Petersen graph (Section 11.12) is not Hamiltonian, but does have 
a Hamiltonian path. 

3. Show that the greedy algorithm does not succeed in finding the path of least 
weight between two given vertices in a connected edge-weighted graph. 

4. Consider the modification of the greedy algorithm for minimal connector. Choose 
the edge e for which w(e) is minimal subject to the conditions that S + e contains 
no cycle and e shares a vertex with some previously chosen edge (unless S = 0). 
Prove that the modified algorithm still correctly finds a minimal connector. 

5. Let G = (V, E) be a multigraph in which every vertex has even valency. Show 
that it is possible to direct the edges of G (that is, replace each unordered pair {x,y} 
by the ordered pair (z, y) or (y,2)) so that the in-valency of any vertex is equal to 
its out-valency. 

6. Let G be a graph on n vertices. Suppose that, for all non-adjacent pairs £, y 
of vertices, the sum of the valencies of z and y is at least n — 1. Prove that G is 
connected. 


11.13. Exercises 


7. (a) Prove that a connected bipartite graph has a unique bipartition. 
(b) Prove that a graph G is bipartite if and only if every circuit in G has even 

length. 
8. Choose ten towns in your country. Find from an atlas (or estimate) the distances 
between all pairs of towns. Then 

(a) find a minimal connector; 

(b) use the ‘twice-round-the-tree’ algorithm to find a route for the Travelling 

Salesman. 

How does your route in (b) compare with the shortest possible route? 


9. Consider the result of Chapter 6, Exercise 7, viz. 


Let F = (A:,...,An) be a family of sets having the property that 
|A(J)| 2 |J|— d for all J C {1,..., n}, where d is a fixed positive 
integer. Then there is a subfamily containing all but d of the sets 
of F, which possesses a SDR. 


Prove this by modifying the proof of Hall’s Theorem from Kénig’s given in the text. 


REMARK. This extension of Hall’s Theorem is in fact ‘equivalent’ to KGnig’s theorem. 
Can you deduce Konig’s Theorem from it? 


10. Konig’s Theorem is often stated as follows: 


The minimum number of lines {rows or columns) which contain 
all the non-zero entries of a matrix A is equal to the maximum 
number of independent non-zero entries, 


where a set of matrix entries is independent if no two are in the same row or column. 
Show the equivalence of this form with the one given in the text. [HINT: if Ais m xn, 
let G be the bipartite graph with vertices &1,... , am, b1,- - , bn, in which {as, 6;} is an 
edge whenever (A);; 4 0. Show that sets of independent non-zero entries correspond 
to matchings in G, and sets of lines containing all non-zero entries correspond to 
edge-covers of G.] 


11. In this exercise, we translate the ‘stepwise improvement’ algorithm in the proof 
of the Max-Flow Min-Cut Theorem into an algorithm for K6nig’s Theorem. 

Let G be a bipartite graph with bipartition {A,B}. We observed in the text that 
an integer-valued flow f in N(G) corresponds to a matching M in G, consisting 
of those edges {a,b} for which the flow in (a,b) is equal to 1. Now consider the 
algorithm in the proof of the Max-Flow Min-Cut Theorem, which either increases 
the value of the flow by 1, or finds a cut. Suppose that we are in the first case, where 
there is a path 

(s, a1, 61, G2, b2, ae yyy b-t) 


in the underlying graph of N(G) along which the flow can be increased. Then 


(a1, b1,...;@r,b-) is a path in G, such that all the edges {b;,a:41} 
but none of the edges {a;,6;} belong to M; moreover, no edge 
containing a, or b, is in M. 
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Such a path in G is called an alternating path with respect to M. (An alternating 
path starts and ends with an edge not in M, and edges not in and in M alternate. 
Moreover, since no edge of M contains a, or b, it cannot be extended to a longer 
such path.) 

Show that, if we delete the edges {b;, a;4ı } from M (i =1,...,r—1), and include 
the edges {a;,6;} (¢ = 1,...,7), then a new matching M' with |M'| = |M|+1 is 
obtained. 

So the algorithm is: 


WHILE there is an alternating path, apply the above replacement to 
find a larger matching. 
When no alternating path exists, the matching is maximal. 


12. Let G be a graph with adjacency matrix A. Prove that the (7,7) entry of A? is 
equal to the number of walks of length d from : to j. 


13. This exercise proves the ‘friendship theorem’: in a finite society in which any two 
members have a unique common friend, there is somebody who is everyone else’s 
friend. In graph-theoretic terms, a graph on n vertices in which any two vertices 
have exactly one common neighbour, possesses a vertex of valency n — 1, and is a 
‘windmill’ (Fig. 11.5). 


Fig, 11.5. A windmill 


STEP 1, Let the vertices be 1,...,n, and let A; be the set of neighbours of i. Using 
the de Bruijn—Erdés Theorem (Chapter 7), or directly, show that esther there is a 
vertex of valency n — 1, or all sets A; have the same size (and the graph is regular). 
In the latter case, the sets A; are the lines of a projective plane (Chapter 9). 


STEP 2. Suppose that G is regular, with valency k. Use the eigenvalue technique of 
Section 11.11 to prove that k = 2. 


14. The ‘Trackwords’ puzzle in the Radio Times consists of nine letters arranged in 
a 3 x 3 array. It is possible to form an English word from all nine letters, where 
consecutive letters are adjacent horizontally, vertically or diagonally. Consider the 
problem of setting the puzzle; more specifically, of deciding in how many ways a 
given word (with all its letters distinct) can be written into the array. 

(a) Formulate the problem in graph-theoretic terminology. 

(b) (COMPUTER PROJECT.) In how many ways can it be done? 


12. Posets, lattices and matroids 


... good order and military discipline 


Army regulations 


Topics: Posets, lattices; distributive lattices; (propositional logic); 
chains and antichains; product and dimension; Möbius inversion; 
matroids; (Arrow’s Theorem) 


TECHNIQUES: Mobius inversion 


ALGORITHMS: Calculating the Möbius function; minimum-weight 
basis 

CROSS-REFERENCES: PIE (Chapter 5); Hall’s Theorem (Chapter 6); 
g-binomial theorem (Chapter 9) 


Order is fundamental to the process of measurement: representing objects by 
numbers presupposes that we can arrange them in order. Often, however, we have 
only enough information to decide the order of some pairs of elements; in this case, 
partial order may be a more relevant concept. In this chapter, we introduce some 
of the many themes of the theory of order. 


12.1. Posets and lattices 


First, we recall the definitions, from Chapter 3. A partial order on X is a relation R 
on X which is 
e reflexive: (x, £) E€ R for all z € X; 
e antisymmetric: (x,y), (y, z) € R imply z = y; and 
@ transitive: (x,y), (y, z) € R imply (2,z) € R. 
(Thus, order models the relation ‘less than or equal’. For the connection with ‘less 
than’, see Exercise 17 of Chapter 3.) As usual, we write z < y for (x,y) € R. The 
pair (X, R) is called a partially ordered set, or poset for short. 
Here are some examples of posets. In each case, the point set is {1,...,n}, for 
some n; we list some elements of R, and the rest follow by reflexivity and transitivity. 
Two comparable points: n = 2, 1 < 2 (so R = {(1,1), (1,2), (2, 2)}). 
Two incomparable points: n = 2, R = {(1,1), (2, 2)}. 
The poset N: n = 4,1 < 3,2 < 3,2 <4, 
The pentagon: n = 5,1 <2 <5,1<3<4<5. 
The three-point line: n = 5§,1<2<5,1<3<5,1<4<5. 
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A convenient way of representing a poset is by its Hasse diagram, We say that y 
covers x if z < y but no element z satisfies x < z < y. (In the list above, we gave all 
pairs z < y for which y covers z.) Now the Hasse diagram of a poset P is a graph 
drawn in the Euclidean plane, such that each vertex corresponds to a point of the 
poset, and for each covering pair x < y, the points representing z and y are joined 
by an edge and the point representing z is ‘below’ the point representing y (in the 
sense that it has smaller Y-coordinate). 

The figure below gives the Hasse diagrams of the five posets described above. 
Note that the Hasse diagram determines the entire poset: u < v if and only if there 
is a path from u to v, every edge of which goes ‘upward’. 


3 4 5 5 
2 
1 2 4 
e ® 2 2 4 
3 
1 
1 2 1 1 


Fig. 12.1. Some Hasse diagrams 


Two specialisations of posets are important. A total order is a partial order 

satisfying 
e trichotomy: for any z,y € X, (x,y) E€ R or z = y or (y,2) € R. 

(With the definition here, the middle alternative z = y is actually covered by the 
other two; but this would not have been so if we had used the ‘strict’ definition of 
partial order.) In any poset, we say that elements z and y are comparable if either 
(x,y) € R or (y,z) € R. Thus, a total order is a partial order in which any two 
elements are comparable. A total order is sometimes called a linear order! and a 
totally ordered set is called a chain. 


A maximal element of a poset (X,<) is an element x such that, if z < y, then 
z = y. (We do not require that y < x for all z, so there may be more than one 
maximal element.) Minimal elements are defined dually. 


(12.1.1) Lemma. Any (non-empty) finite poset contains a maximal element. 


PROOF. Choose any zı € X. If zı is not maximal, there exists r2 € X with 2, < T2 
(which means, of course, that sı < zz and 2, # z2). Continue this process, either 
until a maximal element is found, or we reach an element previously encountered. 
In fact, the second alternative cannot occur; for, if i < j, then 


Ti € Ziyi L.. X Lj-1 X Tj, 


so 2; = z; is impossible. So eventually a maximal element will be found. 
This argument obviously fails in infinite posets: there is no maximal integer, for 


example. 


1 This usage comes from geometry, where the points on a line in Euclidean space are linearly ordered, 
as opposed to the points of a line in projective space, which are circularly ordered. 
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In a poset, we say that z is a lower bound of xz and y if z < z and z < y. A 
greatest lower bound (g.l.b.) of x and y is a maximal element in the set of lower 
bounds. By (12.1.1), if two elements of a finite poset have a lower bound, they have 
a greatest lower bound; but it may not be unique. Upper bounds and least upper 
bounds (l.u.b.s) are defined similarly. 

A lattice is a poset in which each pair of elements has a unique greatest lower 
bound and a unique least upper bound. (We are considering only finite lattices here.) 
A lattice has a unique minimal element 0, which satisfies 0 < x for any element z. 
(For let 0 be any minimal element, and z any element. If z is the g.l.b. of 0 and z, 
then z < 0, so z = 0 by minimality, whence 0 < z. If z happened to be a minimal 
element also, then + < 0, whence z = 0 by antisymmetry.) Dually, a lattice has a 
unique maximal element 1, satisfying x < 1 for all z.? 

We use the notation z A y and z V y for the gb. and lub. of z and y in a 
lattice. These are also called the meet and join of x and y. 

Any totally ordered set is a lattice: if x < y, then z Ay = z and z V y = y. Other 
examples of lattices include: 

o The power-set lattice P(X), whose elements are the subsets of a set X, ordered 
by inclusion. It has z Ay=2rNyandzVy=rUy. 

ə The lattice D(n) of (positive) divisors of the positive integer n, ordered by 
divisibility: x < y if x divides y. The g.Lb. and lub. of x and y are their greatest 
common divisor (z, y) and least common multiple zy/(z, y) respectively. 

e The lattice of subspaces of a finite vector space V = V(n,q), ordered by 
inclusion: this is the projective geometry PG(n, gq), looked at in a different way. 
We have z Ay = My and z Vy = (z,y) = z +y (sum of subspaces!) respectively. 
Following the nineteenth-century tendency towards abstraction and axiomatisa- 

tion in mathematics, a lattice can be regarded as a set on which are defined two 
binary operations A and V and two elements 0 and 1. The next result gives the 
axiomatisation of lattices from this point of view. 


(12.1.2) Proposition, Let X be a set, A and V two binary operations defined on X, 
and 0 and 1 two elements of X. Then (X,A,V,0,1) is a lattice if and only if the 
following axioms are satisfied: 

è Associative laws: z A (y Az) =(xAy)Az and av (yVz) = (Vy) V2; 

ə Commutative laws: r Ay =y Az andzrVy=yV2z; 

ə Idempotent laws: r A£ =a Va = 2; 

exzA(zVy)=c=2aV(tAy) 

ezA0=0ev1l=1. 


Proof. Verifying that the axioms hold im a lattice is not difficult — try it yourself. 
The converse is a little harder. We have to recover the partial order from the lattice 
operations. If z < y, then the g.Lb. of z and y is obviously z; we reverse this and 
define the relation < by the rule that z < y if z Ay = x. We have to show that this 
really is a partial order, and that x A y and z V y are the g.l.b. and Lu.b., and 0 and 
1 the least and greatest elements, in this order. 


2 In an infinite lattice, the existence of 0 and 1 cannot be deduced, and must be postulated. 
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First, note that 2 Ay = z implies z V y = y V (y Az) = y, using the commutative 
laws; and conversely. So the ‘dual’ definition of the order is equivalent to the one 


we used, 
Now we show that < is a partial order. The idempotent laws show that it is 


reflexive. Suppose that z < y and y < x. Then 
g=rAy=yAr=y, 


so the relation is antisymmetric, Finally, suppose that x < y and y < z. Then 
zAy= rT and y Az = y. So 


rAzsa(rtAy)Az=rA(yAz)=rAy=e, 


soxr<z,. 
Now, for any z and y, 


(rAy)Ay=aAlyAy=aANy, 


so (z Ay) < y. By commutativity, also (2 Ay) < z. Thus, x A y is a lower bound for 
z and y. If z is any lower bound, then 


zA(@Ay)=(zArlAy=zAy=z, 


so z < (xz Ay). It follows that z A y is the unique greatest lower bound. The proof 
that z V y is the unique least upper bound is dual. 

Finally, the last axiom shows that 0 is the unique minimal element and 1 the 
unique maximal element. 


12.2. Linear extensions of a poset 


As in the introduction to this chapter, we can regard a partial order as expressing 
our partial knowledge of some underlying total order. This suggests that every 
partial order is a subset of a total order. This is indeed true: 


(12.2.1) Theorem. Let R be a partial order on X. Then there is a total order R™ on 
X such that RC R*. 


A total order containing the partial order R is called a linear extension of R (the 
word ‘linear’ coming from the alternative term ‘linear order’ for a total order). If X 
is finite, this result can be expressed in the form: 


Let (X,<) be a poset. Then we can label the elements of X as 
Z1,.-.,0, such that, if z; <2,, theni <j. 


Our proof will, as usual, assume that X is finite. The idea of the proof is that, if 
R is not itself a total order, then some pair of elements is incomparable; intuitively, 
we don’t yet know the order of these elements. We enlarge R by specifying the order 
of the two elements, and adding various consequential information. The resulting 
relation F is still a partial order. After a finite number of steps, there are no more 
incomparable elements, and we have a total order. 
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So let a,b be incomparable. If we specify a < b, then everything below a must 
become less than everything above b. So we put 


Ri=Ru(lax f b), 


where | a = {x : (x,a) € R} and T b= {y : (b, y) € R}. We claim that R’ is a partial 
order. It is clearly reflexive, since R is. Also, note that | aN 7 b = @; for, if x lies 
in this intersection, then (b, z), (x,a) € R, so (b,a) € R by transitivity, contradicting 
the incomparability of a and b. 

Suppose that (x,y), (y,2) € R'. If both pairs lie in R, then z = y by antisymmetry 
of R. The remark in the last paragraph shows that we cannot have (z,y),(y, 2) € 
lax 6 The remaining case is that (without loss of generality) (x,y) € R, 
(na) E Jax Tb. Then (b,£), (z,y), (y,a) € R, again contradicting the choice of a 
and b. 

The proof of transitivity is very similar. If (z, y), (y,z) € R, then (z,z) € R; 
we cannot have (z,y),(y,z) € | a x f b; and, if (z,y) € R, (y,z) E€ La x T b, then 
zE] a, so (z,z) € R. 

The proof is complete. 


12.3. Distributive lattices 


A lattice L is distributive if it satisfies the two distributive laws 


gV(yAzj)=(2Vy)A(zV 2), 
zA(yVz) =(eAy) Vv (2 Az). 


Two of our examples of lattices are distributive: the lattice P(X) of subsets of 
a set X, and the lattice of divisors of a positive integer n. (In the first case, the 
distributive laws are familiar equations connecting unions and intersections of sets 
easily checked with a Venn diagram. The second is a little harder to see; try it for 
yourself.) 


In view of the first example, any sublattice of the lattice P(X) of subsets of X 
(that is, any family of subsets of X which is closed under union and intersection) is 
a distributive lattice. We could ask whether, conversely, any distributive lattice can 
be represented in this way. This is indeed true, and we prove a stronger version. 


l Let P = (X, <) be a poset. A subset Y of X is called a down-setif y E€ Y, z <y 
imply z € Y; that is, anything lying below an element of Y is in Y.? There are two 
trivial down-sets in any poset; the empty set, and the whole of X. 


(12.3.1) Proposition. The union or intersection of two down-sets is a down-set. 
Proor. Let Y, and Y, be down-sets. Suppose that y € Y, U Y, and z < y. Then 


yen or y € Yz; so z € Yi or z € Ya, whence z € Y, U Y3. The argument for 
intersections is similar. 


3 The term ‘ideal’ is often used. But it has another, conflicting, meaning. 
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Thus, the set of all down-sets of P, with the operations of union and intersection, 
is a distributive lattice (whose 0 and 1 are the trivial down-sets). We denote this 
lattice by L(P). For example, if P is the poset N of Fig. 12.1, then Z(P) is shown 
in Fig. 12.2. Now every finite distributive lattice has a canonical representation of 


this form. 
1234 


Fig. 12.2. The lattice L(N) 


(12.3.2) Theorem. Let L be a finite distributive lattice. Then there is a finite poset P 
(uniquely determined by L) such that L is isomorphic to L(P). 


Proor. How can we recover the elements of P from L? For any point z of P, the 
set | x = {y : y < z} is a down-set, the principal down-set determined by z. We have 
to recognise elements of L corresponding to principal down-sets. 

An element a £ 0 of a lattice L is called join-indecomposable, or JI for short, if 
a = b V c implies a = b or a = c. Now, in L(P), any principal down-set is JI. For, if 
| ao=bve,thenzs €borz€ c whence |x =bor|ae=e (if b and c are down-sets). 
Conversely, any JI in L(P) is a principal down-set. (In Fig. 12.2, the JI elements of 
L(N) are represesnted by solid circles. Note that they form a sub-poset isomorphic 


to N.) 
So, in any finite distributive lattice L, we let P(L) be the set of JI elements, with 


order inherited from L. Then P(L) is the only possible candidate for a poset P such 
that L(P) = L; we show that, indeed, L(P(Z)) = L. The proof is in a number of 


steps. 
STEP 1. Every non-zero element of L is a join of JI elements. 


Proor. If a € L is JI, we are done. Otherwise, a is a join of two elements strictly 
below it in the lattice. By induction (for example, on the number of elements below 
a), these two elements are joins of JI elements; so the same is true of a. 


STEP 2. Every non-zero element a € L is the join of all the JI elements below it. 


Proor. We know that a is the join of some of these elements. The join of all of 
them ‘is no smaller, but is still no larger than a. 


These two steps apply also to 0, if we interpret the join of the empty set as 0. 


Now let X be the set of all JI elements (the elements of the poset P(L)); for 
any a € L, let s(a) = {z € X : z < a}. We show that s is an isomorphism from L 


to L(P(L)). 


STEP 3. s(a) is a down-set. 


12.3. Distributive lattices 


ProoF. Clear from the definition. 
STEP 4. s is a bijection. 


Proof. That s is one-to-one follows from the fact that a is the join of the elements in 
s(a). Now let Y be any down-set in P(L), and let a be the join of the elements in Y. 
Then each y € Y satisfies y < a. Suppose that z ¢ Y and z < a. If Y = {yi Yah, 
then we have z < yı V...V Yn, 50 £A (y1 V -.- V Yn) = z- By the distributive law, 
(z Ay) V -.- V (£ A ya) = z. But z is JI; so, for some i, we have z A y; = z, whence 
z < yi- But this contradicts the facts that z ¢ Y, y; € Y, and Y isa down-set. We 
conclude that Y = s(a). So s is onto. 


STEP 5. s is an isomorphism, i.e. 
(a) s(a Ab) = s(a) N s(b), 
(b) s(a V b) = s(a) U s(b). 
Proor. (a) For z € X, we have z < a A b if and only if z <aandr <b. 
(b) Take z € s(a) U s(b). Then either z € s(a) or z € s(b); sor<aorr <8, 
whence z < a V b. Conversely, suppose that z € s(a V b), so x < a V b. Then 


z=zrA(aVb)=(z ^a) V (z Ab). 


Since z is JI, z = z Aa or z = z A b, whence z € s(a) or z € s(b). 
This completes the proof. 


Among distributive lattices, a special class are the Boolean lattices. These are the 
distributive lattices L possessing a unary operation z +> x’ called complementation, 
satisfying 

e (yy =r Ay, (raya vy’ 
ezVei=l2zAz' =0. 


(12.3.8) Theorem. A finite Boolean lattice is isomorphic to the lattice of all subsets 
of a finite set X, with x interpreted as X \ z. 


Proor. Let L be a finite Boolean lattice. We have an embedding of L into P(X), 
where X is the set of JI elements of L. To show that L = P(X), we show that any 
two JI elements are incomparable — then any set of JI elements is a down-set. 

So suppose that a and b are distinct JI elements with a < b. Then 


aV(bAa@)=(aVb)Alava’)=bA1=8. 
Since b is JI and a £ b, we must have b= b Aa’ <a’. Then 
o=adb=ar(bAad)=bA(adAa)=bA0=0, 
a contradiction. 


Now, if s is the lattice-isomorphism from L to P(X) as in Theorem (12.3.2), we 
have s(a) N s(a’) = 9, s(a) U s(a’) = X; so s(a’) = X \ s(a), as claimed. 
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Another interesting class consists of the free distributive lattices. These are gen- 
erated (in the algebraic sense) by a set X = {21,...,%,}, and have the property that 
two expressions in the generators are unequal unless the definition of a distributive 
lattice forces them to be equal. I will identify the free distributive lattice as L(P), as 
in (12.3.2), but with a bit of hand-waving; a rigorous proof has to use properly the 
formal algebraic definition of freeness. 

Using the distributive laws, any element other than 0 and 1 can be written as a 
join of terms, each of which is a meet of some elements of X. So the only possible 
join-indecomposables apart from 1 are the meets of the non-empty subsets of X. 
The JI element 1 corresponds to the empty set. The order in the lattice of these 
meets is the reverse of the inclusion order of the subsets. 

Moreaver, a down-set in the poset of meets of subsets of X corresponds to an 
up-set in P(X). Since P(X) is ‘self-dual’, we have: 


(12.3.4) Proposition. The free distributive lattice generated by an n-set X is iso- 
morphic to L(P(X)), in other words, to L(L(A)), where A is an antichain with n 
elements. 


However, nobody knows a formula for the number of elements in this lattice for 
arbitrary n. This is a famous unsolved problem. The answer is known only for very 
small values of n. 


12.4. Aside on propositional logic 


The name of Boole is familiar to every computer scientist today, as a result of his 
project to turn set theory and logic into algebra. We now sketch the details. 


Expressions in Boole’s system are built from variables, just as polynomials are; but a Boclean 
variable can take only the two values TRUE and FALSE. (Think of these variables as elementary 
statements or propositions out of which more complicated expressions can be built.) 

We start with a set P of propositional or Boolean variables. A formula is an expression 
involving variables, parentheses, and the connectives V (disjunction, ‘or’), A (conjunction, ‘and'), and 
= (negation, ‘not’), defined by the rules 

ə any propositional variable is a formula; 

if ¢ and y are formulae, so are (¢ V Y), (¢ A), and (74); 

æ any formula is obtained by these two rules. 
In other words, the set of formulae is the smallest set of strings of variables, parentheses and 
connectives which contains the variables and is closed under the three constructions specified in the 
aecond rule. 

A valuation is a function v from the set of variables to the set {PRUE, PALSE}. By induction, v 
defines a function from the set of formulae to the set {TRUE, FALSB}, which is also called a valuation 
and denoted by v, such that the usual ‘truth table rules’ for the connectives apply: 

© if v(¢) = TRUE then v((+¢)) = FALSE, and vice versa; 
o v((¢ V #)) = TRUE unless o(¢) = u(y) = PALSE, in which case v((¢ V p)) = FALSE; 
© v((¢ A #)) = FALSE unless v(¢) = v() = TRUE, in which case v((¢ A Y)}) = TRUB. 

Further connectives can be defined in terms of the ones already given. For example, (¢ — +) 
is shorthand for ((-¢) V ¢), and (¢ © p) for ((¢ — Y) A (% — $)). Truth tables for these can be 
calculated. For example, v((¢ = 4)) = TRUB if and only if v(¢) = v(#). 

A formula ¢ is called a tautology if u(¢) = TRUR for all valuations v, a contradiction if 
o(¢) = FALSE for all v (that is, if (~g) is a tautology). Two formulae ¢, + are equivalent if u(ġ) = u(y) 
for all valuations v; that is, if (¢ = 7) is a tautology. 
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Now it can be checked that the ‘equivalence’ just defined is an equivalence relation, and that the 
connectives induce operations on the set of equivalence classes: if [¢] denotes the equivalence class 
of ¢, then we can set 

lal v [e] = [6 v a), 

[a] A [¥] = [g AW) 

(4 = g), 


and the objects defined don’t depend on the choice of representatives of the equivalence classes. Now 
Boole’s observation can be summarised as follows: 


(12.4.1) Proposition. The set of equivalence classes of propositional formulae, with the above opera- 
tions, is a Boolean lattice. 


Suppose that there are n propositional variables. The number of valuations is 2". Any formula 
4 defines a function v ++ u(¢) from valuations to {TRUB,RALS3}, and two formulae are equivalent 
if and only if they define the same function. Any function is represented by some formula, so the 
number of equivalence classes is 22", So the Boolean lattice has 22” elements. 

By (12.3.3), any Boolean lattice is isomorphic to P(X) for some set X. Can we identify such an 
X here? It must have cardinality 2”. An answer is given by the disjunctive normal form: 


(12.4.2) Disjunctive normal form. Any formula in the variables p1,..., Pn which is net a contradiction 
is equivalent to a unique disjunction of terms (q1 A...Aqn), where each q; is either pi or (pi). 


There are 2" ‘terms’ of the form described in the proposition, and each equivalence class of 
formulae corresponds to a subset of the set of terms. (The equivalence class of contradictions 
corresponds to the empty set of terms.) Moreover, the operations V,A,‘ on equivalence classes 
correspond to union, intersection, and complementation on sets of terms. So the set of terms is the 
required X. 

Another approach to the question gives an even more obvious answer: take X to be the set of 
valuations, and identify an equivalence class with the subset consisting of valuations which give the 
formulae in that clasa the value TRUE. To see the correspondence between the two approaches, note 
that there is a unique valuation which gives the term g; A... A qn the value TRUR, namely the one 
defined by 


,_ [To if g = pi, 
u(p;) = ee if q; = (—p;). 


The disjunctive normal form theorem can be used to show that the lattice of equivalence classes 
of propositional formulae in n variables is the free Boolean lattice on n generators {compare the 
remarks at the end of the last section on free distributive lattices). 


12.5. Chains and antichains 


A chain C in a poset P is a subset of P such that any two of its points are 
comparable. In other words, it is a sub-poset which is a total order. An antichain A 
is a subset such that any two of its points are incomparable. 

We have met these concepts before. Sperner’s Theorem (7.2.1) describes the 
largest antichains in the lattice P(X) of subsets of X. Our proof of this by the LYM 
technique involved covering the poset by chains. A crucial point in the argument 
was: 


If C is a chain and A an antichain in a poset, then |C N A| < 1. 


For two points in this intersection would be both comparable and incomparable! 
From this, we immediately see: 
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(12.5.1) Proposition. (a) If a poset P has a chain of size r, then it cannot be 
partitioned into fewer than r antichains. 

(b) If a poset P has an antichain of size r, then it cannot be partitioned into 
fewer than r chains. 


The proof is trivial, since two points in the same chain must lie in different 
members of a partition into antichains, and ‘dually’, The main goal of this section 
is to prove a pair of results in the reverse direction. The first is straightforward: 


(12.5.2) Theorem. Suppose that the largest chain in the poset P has size r. Then P 
can be partitioned into r antichains. 


Proor. We define the height of an element z of P to be one less than the greatest 
number of elements in a chain whose greatest member is z. (The ‘one less’ is 
conventional: the height of z is the greatest number of ‘steps’ up from the bottom 
of the poset to z.) Let A; be the set of elements of height 7. Then, by hypothesis, 
A; = @ for i > r, so P = AgU...UA,-1; and each A, is an antichain, since if z € A; 
and z < y, then there is a chain zo < ... < z; = £ < y, so y has height greater than 
i 


The ‘dua? result looks similar, but the proof is much more involved.’ 


(12.5.3) Dilworth’s Theorem. Suppose that the largest antichain in the poset P has 
size r. Then P can be partitioned into r chains. 


Proor. The proof is by induction on the number of points of P. Clearly the result 
holds for one-element posets. So suppose that it is true for all posets with fewer 
points than P. Let z be a minimal element of P. 


Casz 1. x is incomparable with everything else in P. Then the largest antichain 
in P \ {x} has size r — 1, since adjoining x gives a larger antichain. By induction, 
P \ {z} can be partitioned into r — 1 chains; we add the singleton chain {x} to 
produce the required partition. 


CASE 2. Some other points are comparable with z. By induction, we can partition 
P\ {z} into r chains Ci,...,C;. For each 4, let T; be the set of elements of C; which 
are comparable with x, and B; = C; \ T;; let B = B, U... U B,. Then every element 
of T, is greater than z, since z is minimal; T; is above B; for each i, and B is the 
set of all elements incomparable with z. We colour the points of B with r colours 
C1,--+,Cr, by the rule that y has colour c; if y € Ci. 

By the argument of Case 1, B can be written as the union of r — 1 chains 
Ci,...,C!_,. Each of these chains can be partitioned into ‘runs’ of elements of 
the same colour. We are about to do some rearranging of these chains, which 
may have to be repeated an unspecified number of times. But each rearrangement 


4 The result is uniformly known as Dilworth's Theorem. It was published by Dilworth in 1950. It 
had been found a few years earlier by Gallai and Milgram, but publication was delayed because 
Gallai wanted the paper translated into English, and Milgram, a topologist, did not fully appreciate 
its importance. 
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strictly decreases the total number of colour runs which occur; so we know that the 
rearrangement process will terminate after a finite number of moves. 

A move is as follows. Suppose that the greatest elements of two or more of 
the chains C{,...,C/_, have the same colour c;. Take the union of all the runs of 
colour c; which lie at the top of their chains. This union U is itself a chain, since it 
is a subset of C;. If y is the smallest element of U, and y € C}, then we move all the 
elements of U to Ci, where they sit at the top, forming a single run. So the number 
of runs has decreased, as claimed; and the new C}; are still chains. 

At some stage, it is no longer possible to apply a move of this type. This 
must be because the greatest elements of the chains all have different colours. Re- 
numbering if necessary, we may assume that the greatest element of C; has colour 
ci for i = 1,...,r — 1. Now C} = T; U C! is a chain for i = 1,...,r — 1, since the 
greatest element of C; lies below T;. Finally, C} = T, U {x} is a chain, since z lies 


below all T;. So we have the required partition into chains Ci,...,Ci. 


Perhaps the relative difficulty of this theorem is more understandable when you 
realise that it contains Hall’s Marriage Theorem (6.2.1) as a special case! 
Suppose that A,,...,A, are subsets of X satisfying Hall’s Condition (HC): 


[A|> |J] for JC {1,...,7}, 


where A(J) = Ujez A; We construct a poset P as follows. The elements of P 
are the points of X and symbols y,,...,y,, with  < y; if z € Aj, and no other 
comparabilities. We set Y = {y1,...,y,}. Now X is an antichain in this poset. 
We claim that there is no larger antichain. For let $ be an antichain, and set 
J = {j:y; € S}. Then S contains no element of A(J); so 


ISI < I+ |X| — |AC)I < IXI, 


by (HC). 

Now Dilworth’s Theorem implies that P can be partitioned into |X| chains, 
Each of these chains must contain a point of X. Let the chain through y; be {z;, yi}. 
Then (x,,..., £n) is a system of distinct representatives for (Aj,...,An): for z; € A; 
(since x; < yi), and z; # z; for i  j (since the chains are disjoint). 


12.6. Products and dimension 


Suppose that a number of objects are being compared on several different numeric 
attributes. If z is better than y on all these attributes, we are justified in saying that 
z beats y. But if z is better on some attributes and y on others, then, depending 
how the attributes are scaled or weighted, we might come to different conclusions 
about their ordering, and it seems safest to say that z and y are incomparable in 
this case. 

Accordingly, let (X1,<,),...,(Xn:<n) be posets. The direct product of these 
posets is the poset (X, <), where 


X=Xı xX... XXn = {(£1,. 2n): 21 € X1,- eln E Xn}, 
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and 
(£1,-.-32a) < (thy---,9n) ifand only if z; <; yi for i =1,...,7. 


It is a simple matter to show that (X, <) is indeed a poset. Moreover, a direct 
product of lattices is a lattice, with meet and join defined by 


(21,--+, En) A (Yis yn) = (21 M Yii- En An Yah 
(213-3 En) V (Yiye. 3 Ya) = (21 Vi V1,- - ->En Va Yah 


and 0 = (01,...,0,), 1 = (1i,-.-, ln). 
Some familiar posets are direct products. Notably: 


(12.6.1) Proposition. (a) If |X| = n, then the power-set lattice P(X) is the direct 
product of n copies of the two-element lattice {0,1}. 

(b) Ifn = pt ...p%", where pi,..., pr are distinct primes, then the lattice D(n) of 
divisors of n is isomorphic to the direct product of the lattices D(p{'),..., D(pi"). 


PRoor. (a) Let X = {21,...,2n}. We identify any subset Y of X with its characteristic 
function (€1,...,€n), Where e; = 1 if z; € Y, e; = 0 otherwise. This is a bijection 
between P(X) and {0,1}", Moreover, if Y and Z have characteristic functions 
(e1,-.-,€n) and (fi,..., fn) respectively, then 
YC Ze (Yi) (z; €Y > 2, € Z) 

s (vi) (e =1 > f;=1) 

e (Vi) (e: S fi), 
so the map is an isomorphism. 

(b) is an exercise. 


The concept of direct product gives us a measure of how far a poset is from 
being totally ordered. Essentially, this is the smallest number of different numerical 
attributes required to produce the partial order by the recipe at the start of this 
section. Formally, we define the dimension of a poset P to be the smallest integer 
d such that P can be embedded as a sub-poset of the direct product of d totally 
ordered sets. 


(12.6.2) Proposition, The poset P(X) has dimension |X|. The dimension of the poset 
D(n) is equal to the number of distinct prime divisors of n. 


Proor. We found isomorphisms from these posets to products of the stated number 
of totally ordered sets. It is necessary to show that they cannot be embedded in 
products of fewer total orders. More generally, we claim that the product of n total 
orders, each with more than one point, has dimension n. The result is clear if n < 2, 
so we may suppose that n > 3. 

We consider a special two-level poset, the standard poset, with 2n vertices 


Qi,- -r ny b1,--- 1 On; 


the comparabilities are a; < b; if (and only if) i Æ j. 


12.7. The Mobius function of a poset 


Srer 1. If P is a sub-poset of Q, then dim(P) £ dim(Q). 


Srep 2. If P is the direct product of n total orders, each with at least two points, 
then P contains the 2n-point standard poset. For suppose that u;, v; are elements 
of the ¿tè factor, with u; < U; for i = 0,... n — 1. Now let a; be the n-tuple with gb 
entry v; and j entry u; for j # i; and let b; be the n-tuple with i" entry u; and jth 
entry vj for j # i. It is readily checked that these elements form a standard poset. 


STEP 3. The dimension of a 2n-point standard poset is n. Clearly it is not greater 
than n. Suppose that the standard poset is embedded in the product of m total 
orders. For each i, there exists a j such that the j"" coordinate of b; is strictly smaller 
than that of any other point b+, since otherwise a; (whose coordinates are all smaller 
than the corresponding coordinates of b, for k # i) would lie below b;. Clearly this 
requires at least n coordinates. 


EXAMPLE. The poset N has dimension 2: it can be represented by the four points 
(2,0), (0,1), (3,2), (1,3). 


It’s not obvious that a finite poset has finite dimension; but this is indeed true. 


(12.6.3) Theorem. The dimension of a finite poset P is finite, and is not greater than 
the number of linear extensions of P. 


Proor. Let P = (X, R), and let (X,R:),...,(X, Ri) be the linear extensions of P. 
We map X to the direct product of these total orders by the diagonal embedding: 
z+ (2,2,...,2) Now, if (z,y) € R, then (x,y) € Ri for i = 1,...,k; so 
(v,...,") < (y,..-,y) in the direct product. Suppose that z and y are incomparable. 
The proof of Theorem 12.2.1 shows that there is a linear extension R; of R with 
(z,y) € Rj, and another linear extension Rj with (yx) € Rj; thus, (z,...,2) and 
(y,....y) are incomparable. So the diagonal embedding is an isomorphism. 


12.7. The Mobius function of a poset 


An n x n real matrix A = (a;;) can be regarded as a function a from N x N to R, 
where N = {1,2,...,7}, whose values are given by a(i, j) = aij. From this point of 
view, the fact that N is an ordered set leads us to consider the matrices or functions 
‘supported’ by the order, that is, functions which satisfy a;; = 0 unless ¢ < j: these 
are precisely the upper triangular matrices. They form an algebra: that is, they are 
closed under matrix multiplication as well as addition and scalar multiplication. In 
particular, an upper triangular matrix is invertible if and only if its diagonal entries 
are all non-zero. We will extend this point of view to an arbitrary finite poset. 


Let P = (X,<) be a finite poset. The incidence algebra I(P) of P is the set 
of functions f : X x X — R which satisfy f(x,y) = 0 unless x < y. Addition and 
scalar multiplication are defined pointwise, and two functions are multiplied by the 
rule 


feg(z,y)= SO f2,z)9(z,y)- 


agecy 
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(12.7.1) Proposition. If |X| = n, the incidence algebra I(P) is isomorphic to a 
subalgebra of the algebra of upper triangular matrices. A function f is invertible if 
and only if f(z, z) #0 for all z € X. 


PROOF. We take a linear extension of P (Theorem 12.2.1); that is, we number the 
elements of X as z1,...,2, 50 that, if z; < zj, then i < j. Now we map f EKP) 
to the matrix A = (aj;) where aj; = f(£:, xj). Clearly, A is upper triangular. Also, 
the map is an isomorphism, since if matrices A and B correspond to f and g, then 
the matrix corresponding to fg has (i,j) entry 


E f(ti,ze)g(te,t3) = DO ainda; 


apo, Sr; igk<} 


= YS ands; 


1<ign 


the last inequality holding because, unless i < k < j}, either aj, or bg; is zero. In 
particular, fg(z,y) = 0 unless z < y (since there are no terms in the sum); so 
fg E (P). 

Finally, note that the values f(z,x) are the diagonal elements of the matrix 
corresponding to f. So a function satisfying the condition f(z, x) # 0 corresponds 
to an invertible matrix. We need to know that the inverse function does lie in T (P). 
For this purpose, we give an algorithm to compute an inverse function; the fact that 
the inverse is unique then implies the result. 

For z < y, we define the interval [x, y] to be the set {z:a <z < y}, or the poset 
induced on this set. Now suppose that f(z,z) # 0 for all z € X. We calculate the 
values g(x,y) of a function g € I(P) by induction on the cardinality of [z, y], as 
follows: 


If |[z, y]| = 0, then x £ y, and we set g(x,y) = 0. 
If (x, y]| = 1, then z = y, and we set gle, z) = f(x,z)'. 
If |{z, y]| > 1, we set 


g(z, y) = —f(e,z)" ( > feza) -. 


a<zsy 


The function g is well-defined, because the values of g on the right-hand side 
of the last equation have the form g(z,y), where  < z < y; 80 the interval [z, y] 
is properly contained in [z,y], and the values are defined by induction. Clearly 
g € I(P). A short calculation shows that, indeed, fg(z,2) = 1 and fg(x,y) = 0 if 
z Æ y; so g is the inverse of f. 


Three particular elements of I(P) are specially important. The first is the 
function e, the characteristic function of equality: 


e(z) = {¢ if 2 =y 


0 otherwise; 


12.7. The Möbius function of a poset 201 


this is the identity element of I(P), corresponding to the identity matrix. Next is the 
function i, the characteristic function of the partial order: 

. _fJ1l ife<y 

(2,9) = G otherwise. 
Finally, the Mébius function p of the poset is the inverse of the function z. That is, 
it is characterised by the equation 


—_Jl ifr=y 
p> a(z, 2) = f otherwise. 
wSzsy 


(12.7.2) Proposition. The Mobius function is integer-valued. 


Proor. Examine the proof of (12.7.1), which gives a method for calculating the 
inverse of a function: take f = i there. Since i(x, £) = 1 for all æ, the factor i(x, 1)! 
is equal to 1. Now y(z,y) is a linear combination of values of (z, y) with integer 
coefficients (in fact, all equal to —1), where x < z < y; by induction, p(x, y) is an 
integer. (The induction starts with y(,2) = 1.) 

Note that the value of the Mobius function at (z, y) depends only on the poset 
[x,y]; points outside this interval don’t affect the value. For the record, we translate 
the defining property of the Mobius function as follows. This result is teferred to as 
Mobius inversion in the poset P. 


(12.7.3) Proposition. Let f,g be elements of I(P). Then the following are equivalent: 
(a) fley) = $ 92,2); 


z£zSy 


(b) glz,y)= J, Fle, z)e{z, y) 


rsesy 


For a simple but important example, we have 


(12.7.4) Proposition. Let P be a totally ordered set. Then the Möbius function of P 


is 


1 ifr=y, 
ulz, y) = fi if y covers z, 

0 otherwise. 
PROOF. Indeed, in any poset, if y covers z, then p{z,y) = —1, since only the term 
z = y occurs in the sum in (12.7.1). Now, if z < y and y is not the unique element 
z which covers z, a simple induction shows that u(x, y) = 0. (This induction begins 
with the case where y covers z; then p(z, y) = —(u(z,y) + u(y, ¥}) = 0.) 

Conveniently, the Mobius function of a direct product of posets is equal to the 

product of the Möbius functions of the factors: 


(12.7.5) Proposition. Let P,,...,P, be posets, and let P = P, x... x Py. Then the 
Möbius function of P is defined by 


k 
plti. Tk) (Yrs. Ye) = I] #(zi,¥:)- 


t=1 
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Proor. Since the Möbius function is unique, it suffices to prove that the right-hand 
side does have the property that 


1 ifr=y 
F wena {h Heat 
siy , 0 otherwise. 
If z = (z,..-,2%), then z < z < y if and only if z; < z; < y fort = 1,...,%; so the 
sum on the left is over the product of the intervals [z,, y;]. Then this sum factorises 
as shown in the proposition. 


From this, we can calculate the Mobius functions of two important posets. 
(12.7.6) Theorem. (a) The Möbius function of the Boolean lattice P(X) is given by 


—Jf(-14rll ify CZ, 
n(Y, Z) E otherwise. 


(b) The Möbius function of the lattice D(n) of divisors of n is given by 
— {(-1)* ifz/y is the product of d distinct primes 
u(y 2) { 0 otherwise. 
This is immediate from (12.7.4), (12.7.5) and (12.6.1). 
REMARK 1. Both P(X) and D(n) have the property that any interval is isomorphic 
to a lattice of the same form: [Y, Z] * P(Z \ Y) in case (a), and [y, 2] = D(z/y) 
in (b). Thus, in these cases, we can regard the Mobius function as having a single 
argument, setting (Y) = »(0,Y) in P(X), and u(y) = (1,4) in D(x). The values 
of these functions are then given by 
mY) =(-D"! in P(X) 
~{(-l)* ifysp...pa - 
uy) (6 otherwise in D(n) 
where pi,...,pa denote distinct primes. The latter function is the classical Mobius 
function met with in number theory. 
REMARK 2. Using the form of the Mobius function for P(X), the statement of (12.6.3) 
translates precisely into (5.2.2), an equivalent form of the Principle of Inclusion and 
Exclusion. Thus Mobius inversion is a generalisation of PIE. 
REMARK 3. The ‘classical’ form of Mobius inversion reads as follows. 


Let f,g be functions on the positive integers. Then the following 
are equivalent: 


(a) f(r) = do 9(d); 


din 


(b) g(r) = L Adela) 


Here is an application. In Section 4.7, we found that the number a, of monic ir- 
reducible polynomials of degree n over a field with q elements satisfies the recurrence 


relation 
Z daa = g`. 
din 
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By Mobius inversion (applied with f(n) = q", g(n) = nan), we find a formula for 
n? 


1 
an = = apn /d), 


d|r 


where p is here the classical Möbius function. 


12.8. Matroids 


The notion of independence shows up in many different places in mathematics: 
eg l Ammen some less so. We'll see that it obeys the same laws in these different 
o Linear independence in vector spaces. 
. A closely related notion occurs in projective or affine spaces. Any set of k points 
in such a space lies in a flat of dimension at most k — 1; it is called independent 
if it lies in no flat of dimension smaller than k — 1. P 
e In a graph (V, E), a set E' of edges is acyelic if (V, E’) is a forest,’ 
e Let F be a family of subsets of X. A set {z1,..., zx} of points of X is a partial 
transversal for F if there are distinct sets A1,..., Ax € F such that x; € A; for 
i=1,...,k; in other words, (z1,..., 24) is a SDR for a subfamily of F ' 


The common concept here is that of a matroid. A matroid is a pair (X,Z) 
where T is a non-empty family of subsets of X having the properties: e 
e Berediiary property: if Y € Zand Z CY, then Z € T; 
e Exchange axiom: if Y,Z € T and |Z| > |Y], then t i 
volver |Z| > |¥|, then there exists z € Z such that 


The members of T are called independent sets. In fact, there are many other ways 
to define a matroid, and the beginner is often bemused by the many axiom systems 
As a compromise, I will describe some other structures which are equivalent to the 
notion of a matroid, but without giving all the axiomatisations. 


l It follows immediately from the second matroid axiom that any two maximal 
independent sets have the same cardinality. This number is called the renk of 
the matroid, and a maximal independent set is called a basis. Dually, a minimal 
dependent set is called a cycle. If Y is any subset of the point set X of a matroid 
then the members of Z contained in Y clearly satisfy the matroid axioms, so define 
a matroid on Y. Let p(Y) denote its rank, so that p is a function from P(X ) to the 
non-negative integers. A set Y is called closed if p(Y U {x}) > p(Y) for all z ¢ Y 
The closure o(Y) of an arbitrary subset Y is the smallest closed set containing it. l 


(12.8.1) Proposition. A matroid on X is determined by any of the following: the 
bases; the rank function; the cycles; the closed sets; the closure operator on P(X). 


5 The graph may contain loo i 
ps or multiple edges. By conventi i 
at most one edge joining any pair of vertices. ” nfion, a forest has no loops, and contains 


5 An alternative term is ‘combinatori 
orial pregeometry’. To the surprise of i 
of nobody else, this term has not become standard. prise of its proponents, but perhaps 
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Proor. As we have explained, each of these structures is determined by the indepen- 
dent sets. We must show the converse. 

The axioms imply that a set is independent if and only if it is contained in a 
basis. Obviously, a set is independent if and only if it contains no cycle. Also, a set 
is independent if and only if its rank is equal to its cardinality. 

For the last two, we first observe that a set is closed if and only if it is equal to 
its closure, so the closed sets and the closure operator carry the same information. 
Moreover, the rank of a set is equal to the rank of its closure, so it is enough to 
determine the rank of the closed sets. Now the rank of a closed set Y is the length 
of any maximal chain of closed sets with greatest element Y. 


(12.8.2) Proposition. Each of the following examples defines a matroid: 
e X is a subset of a vector space, T the set of linearly independent subsets of X; 
o X is.a subset of a projective or affine space, T is the set of independent subsets 
of X; 
o X is the edge set of a graph, T the set of acyclic subsets of X; 
e T is the set of partial transversals of a family of subsets of X. 


Proor. The proofs show various similarities and differences, so I will sketch the first, 
third and fourth. (The second is almost the same as the first.) 

1. Let X be a set of vectors. Clearly, any subset of a linearly independent 
subset is linearly independent. Suppose that Y and Z are linearly independent, with 
|Z| > |Y]. Then dim(Z) > dim{Y), so Z Z (Y). Thus, there is a vector z € Z not 
contained in (Y), and Y U {z} is linearly independent. 

3. Let X be the edge set of a graph on the vertex set V. Clearly a subset 
of an acyclic subset is acyclic. If Y is acyclic, then the number of connected 
components of (V,Y) is |V] — |Y| + 1, by (11.2.1). Thus, if |Z| > |¥], then {V,Z) 
has fewer components than (V,¥), and so some edge z € Z is not contained within 
a component of (V,¥); thus Y U {z} is acyclic.” 

4. Any subset of a partial transversal is clearly a partial transversal. Suppose 
that Y and Z are partial transversals, with |Y| < |Z|. Let Ay be the set represented 
by y € Y, and B, the set represented by z € Z. We consider, for each z € Z, the 
set X’ = Y U {z}, and the subsets A, = Ay N X' and B! = B, N X'. If this family 
of sets has a SDR, its elements must be all the points of X', which is thus a partial 
transversal, and we are done. So we can suppose that this fails for all z. But this 
means that some n + 1 of these sets contain only n elements of X'. These n + 1 sets 
must inclide B}, since any subfamily of the A, has a SDR. This means that z € Y 
for all z € Z, a contradiction, since |Z| > |Y]. 


As usual with abstract concepts, the point of this result is that a single argument 
suffices to prove a theorem applicable in several different fields. We should look 
to these fields for results which can be formulated in terms of independent sets. 
One such is the greedy algorithm for the minimal connector (Section 11.3), which 


7 A cycle in this matroid is the edge set of a circuit in the graph (possibly a loop or two parallel 
edges) — hence the name. 
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extends to the minimum-weight basis in a matroid whose elements have weights (see 
Exercise 13). 


The closed sets of a matroid form a lattice, where meet is intersection, and the 
join of two sets is the closure of their union. Boolean lattices and finite projective 
and affine geometries form special cases of these so-called geometric lattices, which 
have been axiomatised and studied in their own right. We make just one observation. 

A matroid is called geometric if the empty set and all singletons are closed. 
Now it is possible to pass from any matroid to a geometric matroid in a canonical 
way, which parallels exactly the procedure for passing from a vector space to the 
corresponding projective space (Chapter 9). 


STEP 1. By removing all points in the closure of the empty set, we produce a matroid 
in which the empty set is closed. 


Step 2. Now write z ~ y if x = y or {z,y} is dependent (in other words, if {x,y} 
has rank 1). It follows from the exchange axiom that this is an equivalence relation. 
There is a matroid induced in a natural way on the set of equivalence classes. (Any 
closed set is a union of equivalence classes.) 

In the case of a vector space V, Step 1 removes the zero vector, and Step 2 calls 
two vectors equivalent if one is a scalar multiple of the other; so the equivalence 
classes are the 1-dimensional subspaces, that is, the points of the projective geometry. 
In the case of a graph, Step 1 removes loops and Step 2 removes multiple edges, 
leaving a simple graph. 

Now, in general, geometric matroids and geometric lattices are equivalent con- 
cepts: the points of the geometric matroid are the elements of the lattice which cover 
0; an arbitrary element of the lattice can be identified with the set of points lying 
below it; and, as explained earlier, we can recover the rank function, and hence the 
independent sets, from the closed sets. 


We conclude this section with a generalisation which pulls itself up by the 
bootstraps. Our third example of a matroid arose from the partial transversals of 
a family {Ai,...,An} of subsets of X, that is, the sets of points supporting SDRs 
of subsets of the family. Now we suppose that there is already a matroid (X,Z) 
defined on the point set. We ask: Is there an independent transversal? The answer is 
formally similar to Hall’s Theorem (of which it is a generalisation). 


(12.8.4) Theorem. Let Ai,...,A, be subsets of X, and let (X,Z) be a matroid. 
Then there is an independent transversal to the family if and only if, for every 
JG {1,...,7}, 

p(A(J)) > [J1 


REMARK. Hall’s Theorem corresponds to the case where the matroid is trivial (every 
set independent), so that p(¥Y) = |Y | for any subset Y of X. 


Proor. If there is an independent transversal, then for any J C {1,...,n}, A(J) 
contains an independent set of size |J|, so its rank is at least this large. The converse 
is an exercise, which can be solved by re-writing (with care) the proof of Hall’s 
ne given in Section 6.2 (or, indeed, almost any other of the standard textbook 
proofs). 
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12.9. Project: Arrow’s Theorem 


One of the problems of politics involves ‘averaging out’ individual preferences to 
reach decisions acceptable to society as a whole. In this section, we prove Arrow’s 
Theorem, which shows that this is indeed a difficult task! 


We suppose that J is a society consisting of a set of n individuals. These individuals are to be 
offered a choice among a set X of options, for example, by a referendum. We assume that each 
member i of the society haa made up her/his mind about the relative worth of the options. We can 
describe this by a total order <; on X, for each i € J. A social choice function is a rule which, given 
the ‘individual preferences’ <; for each ¿ € J, comes up with a ‘social preference’ < on X, subject 
to four conditions listed (and justified) below. In other words, it is a function from the set of all 
n-tuples of X to the set of total orders, satisfying Axioms (A1)-(A4) below. Arrow’s Theorem asserts 
that, if there are at least three options, then no social choice function is possible. 


(A1) If x < y (in the social preference), then the same remains true if the 
individual preferences are changed in y’s favour. 


(This means that, if <; (i € I) are another system of individual preferences satisfying 
u <; v & u < v for all u,v £ y, and 
u <; y > u <$ y for all u, 
and <' is the corresponding social preference, then x <’ y holds. 
(A2) I£Y C X and two sets {<i}, {<4} of individual preferences on X have the 
property that <; and <j induce the same ordering on Y for each i € I, then the 
corresponding social preferences < and <' induce the same ordering on Y. 


(This is the principle of irrelevant options, and asserts that the working of social choice should 
not be affected if some of the options are struck out.) 


(A3) For any distinct r,y € X, there is some system of individual preferences 
for which the corresponding social preference has x < y. 


(In other words, it should be possible for society to prefer y to z if enough individuals do so. In 
fact, it follows from (A1)-(A3) that, if z <; y for alli € /, then x < y: that is, if everybody prefers y 
to 2, then society does too.) 


(A4) There is no individual i such that < coincides with < for all systems of 
individual preferences. 


This axiom requires that there should not be a dictator whose opinions prevail against all 


opposition! 
(12.9.1) Areow’s Theorem. If |X | > 3, then no social choice function exists. 


Proor. Suppose that we have a social choice function. If (z, y) is an ordered pair of distinct options, 
we say that a set J of individuals is (z, y) decisive if, whenever all members of J prefer y to q, then 
so does the social order; formally, if z <; y for alli € J, then £ < y. Further, we say that J is 
decisive if it is (x, y)}-decisive for some distinct z, y, We claimed after the statement of (A3) that the 
whole society Z is (x, y)-decisive for all z, y; let us first prove this. By (A2), we can suppose that x 
and y are the only options. Now by (A3), there is some system of individual preferences which causes 
zr < y to hold; and by (A1), this remains true if we alter them so that all individuals prefer y to z. 
Let J be a minimal decisive set. Then J Æ 0, by (A3). Suppose that J is (z, y)-decisive, and let 
i be a member of J. 
Cua. J = {i}. For let J’ = J \ {i} and K = F \ J. Let z be a member of X different from x and y 
(remember that |X| > 3). Consider the individual preferences for which 
Ey <i zy 
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z <j z <; y forall j € J; 

y <e z <k eforallk eK. 
Then 

z < y, since all members of the (z, y)-determining set J think so; 

y < z, since if z < y then J’ is (z, y)-decisive, contradicting the minimality of J. 
Hence x < z. But then {i} is (z, z)-decisive, since nobody else agrees with this order. By minimality 
of J, we have J = {i}. 

The proof shows, in fact, that {i} is (x, z}-decisive for all z £ z. 


CLAIM. i is a dictator. 

Choose w # z, and z # w, x. Consider the individual preferences in which 

w<; o <y z, 

z<, w <p T for all k Zi. 
Then w < a (because everybody thinks so) and z < z (because i thinks a0); so w < z, and {i} is 
{w, z}-decisive. Finally, a similar argument (left to the reader) shows that {i} is (w, z)-decisive for 
any w £ x. The claim is proved; and so Axiom (A4) is violated, proving the Theorem. 


12.10. Exercises 


1. Describe the lattice E(P) for each of the posets P of Fig. 12.1 (other than N, see 
Fig. 12.2). 

2. Show that the pentagon and the three-point line are lattices, but are not 
distributive. 

REMARK. It can be shown that a lattice is distributive if and only if it contains 
neither the pentagon nor the three-point line as a sublattice. 


3. A poset P is a two-level poset if it is the union of two antichains U and L with 
no element of L greater than any element of U (so that the only comparabilities 
which occur are of the form ! < u for I € L, u € U). In the deduction of Hall’s 
Theorem from Dilworth’s, we used a two-level poset. Show, conversely, that the truth 
of Dilworth’s theorem for two-level posets can be deduced from Hall’s Theorem. 
[Hint: you may find the form of Hall’s Theorem given in Exercise 7 of Chapter 6 
useful.] 


4, Prove Proposition 12.5.1(b). 


5. (a) Find the dimension of the pentagon and the three-point line. 
(b) Find all linear extensions of N, the pentagon, and the three-point line. 


6. (a) Show that any antichain (containing more than one point) has dimension 2. 
(b) The incidence poset of a graph T consists of the vertices and edges of T 
ordered by inclusion, where an edge is regarded as a set of two vertices. Calculate 
the dimensions of the incidence posets of some small graphs. Show that the only 
connected graphs whose incidence posets have dimension 2 are the paths. 


7. Prove Theorem 12.8.4. 


8. Calculate the Mobius functions of the posets whose Hasse diagrams appear in 
Fig. 12.1. 

9. Prove that the Möbius function of the lattice of subspaces of a vector space over 
GF(q) is given by 


u(¥,Z) = {pigeon if Y CZ, 


0 otherwise, 
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where k = dim(Z) — dim(Y). [Hint: It suffices to consider the case when Y = {0}. 
Now put ¢ = —1 in the g-binomial Theorem (9.2.5), 


10. Let a,b be elements of a poset P. Prove that ula, b) = Zisol—1} ci where c; is 
the number of chains 
Q@=i29<...00; 26. 

(Hint: Calling the right-hand side pla, b), it suffices to show that Ya<e<s p(a,z) = 0 
for a < b. Now the displayed chain contributes (—1)' to p(a,b), and also (-1)'7? to 
pla, 2i-1).] 
11. Let (X,Z) be a matroid. 

(a) Let Y C X. Prove that any basis for Y can be ‘extended’ to a basis for X. 

(b) Let Y C X and let C be a cycle in Y. Prove that, for any x € C, we have 


oY \ {z}) = (Y). 
(c) Show that the rank function satisfies 


pY UZ) + o(¥ NZ) < pl) + pZ). 


[Hint: Recall from linear algebra the argument which proves this (with equality) 
for subspaces of a vector space.] 

(d) Give an example where strict inequality holds in (c). 
12. Let (X,Z) be a matroid, and I € Z. Show that (X\I{J: JUL e T}) isa 
matroid. Prove that its rank function p’ is given by p'(Y) = p(Y UL ) = (F). 
Hence show that any interval in a geometric lattice is a geometric lattice. 
13. Prove that the greedy algorithm succeeds in finding a basis of minimum weight 
in a weighted matroid. 
14. Show that Arrow’s Theorem is false if there are just two options and at least 
three individuals in the society. [HINT: try democracy!] 

How is this result related to the contents of Section 7.1? 
15. Exploit the connection between terms in the disjunctive normal form and 
valuations to prove the disjunctive normal form theorem (12.4.2). 


16, (a) Show that the free distributive lattice with 3 generators has cardinality 20. 
(b) COMPUTING PROJECT. Calculate the cardinality of the free distributive lattice 
for larger numbers of generators. 


13. More on partitions and 
permutations 


More and more I'm aware that the permutations are not unlimited. 


Russell Hoban, Turtle Diary (1975) 


Topics; Partition numbers; conjugacy classes of permutations; 
diagrams and tableaux; symmetric polynomials 


TECHNIQUES: Generating functions; proof of identities by counting 
ALGORITHMS: Robinson—Schensted—Knuth correspondence 


CROSS-REFERENCES: Permutations and partitions (Chapter 3); par- 
tial order (Chapter 12); [Catalan numbers, involutions (Chapter 4), 
Gaussian coefficients (Chapter 9); cycle index (Chapter 15)] 


In Chapter 3, we considered partitions and permutations of a finite set. Here, we 
look at the ‘unlabelled’ versions. These are partitions of an integer n, and conjugacy 
classes of permutations in the symmetric group Sn. It turns out that there are equal 
numbers of these objects, and a rich interplay between them. The story also involves 
symmetric functions and the character theory of Sn. 


13.1. Partitions, diagrams, and conjugacy classes 


Let n be a positive integer. A partition of n is an expression for n as a sum of 
positive integers, where the order of the summands is unimportant.' We can arrange 
the parts in order, with the largest first. Thus, there are five partitions of 4: 


4=3+1=242=24141=1414141. 


As well as this obvious notation, a partition of n is sometimes written in the form 
12% ...n%, where a; is the number of parts equal to 7, that is, the number of 
occurrences of i as a term in the sum. The ‘factor’ ¿~ is not an exponential; the 
integer ¢ is merely a placeholder for the term a,. If a; = 0, the ‘factor’ can be omitted. 
In this notation, the five partitions of 4 are 


4", 311, 2, oN? 14. 


1 If the order of the summands is significant, then the number of partitions of n is 2°~! for n > 1. 
See Exercise 9(b) of Chapter 4. 
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We also use the notation À F n to mean ‘À is a partition of n’. 

In addition, we use a pictorial representation of partitions by means of diagrams 
D(A), defined as follows. Let A be the partition n = ni +... +n with ni >... > np 
The diagram of \ has k rows; the i" row (numbering from the top) contains n; cells, 
aligned at the left. Cells may be represented either by dots or by empty squares, 
whichever is convenient; I will make use of both in appropriate places. Thus, the 
diagram of the partition 7 = 3 + 2 + 2 or 312? is shown in Fig. 13.1. 


2 


Fig. 13.1. The diagram of a partition 


Let à F n. The conjugate or dual partition \* of À is the partition of n whose 
diagram is the transpose (in the sense of matrices, that is, interchanging rows and 
columns) of that of À. For example, if À = 312?, as above, then \* = 3711. In general, 
if A = 152%, n, then \* = 15:92... n=, where b; is the number of indices j for 
which a; > i. Obviously, (A*)" = A. 

Let p(n) be the number of partitions of n, the n'è partition number. (Check 
that, for n = 1,2,3,4,5, we have p(n) = 1,2,3,5,7 respectively.) The function 
p is sometimes called the partition function. We prove first an expression for its 
generating function. By convention, p(0) = 1; the unique partition of 0 has no parts. 


(13.1.1) Theorem. $` p(n) = J] (1 — 2)’. 


n20 i>1 
Proof. The right-hand side is 
Td +e eee. Stt ALPE.) 


i>t 
A term in t” in this product is obtained by selecting, say, ¢*! from the first factor, 
i? from the second, and so on, with a, +2az +... =n (so that 192... + n). Each 
partition of n gives a contribution of 1 to the coefficient of t”, so this coefficient is 


equal to p(n). 
This expression for I(t) = Z p(n)t” is not much use as it stands. But in the next 
section, we'll see that it gives a recurrence relation for the partition numbers.* 


2 These are also called Ferrers diagrams or Young diagrams. 

3 This convention corresponds to the indexing of matrices, where rows are numbered down the page 
and columns from left to right. An alternative convention is based on Cartesian coordinates, where 
the independent variable increases from left to right, and the dependent variable from bottom to top. 
According to Ian Macdonald, Symmetric Functions end Hall Polynomials, p. 2, “Readers who prefer 
this convention should read this book upside down in a mirror’. Computer users will recognise the 
difference between text and graphics output. 

4 For analysts, we note that II(é) is an analytic function of the complex variable t for |t| < 1, but 
has a singularity at every root of unity, so it cannot be analytically continued outside the unit disc. 
(The unit circle is a natural boundary.) 
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There are two convenient orderings defined on the set of all partitions of n. Let 
wk n; say, Ain =m +... +n, and pin = m +... +m, with the convention 
that undefined parts are zero. 

(a) We say that À precedes p in the reverse lexicographic order (1.L0.) if, for some 
i, we have n; = m; for j < i and n; > mj. (If we regard a partition as a ‘word’, 
whose ‘letters’ are positive integers, this is the dictionary order of words with the 
convention that large integers precede small ones in the ‘alphabet’.) This is a total 
order. 

(b) We say that À precedes y in the natural partial order (n.p.o.) (written A < u) 
if A # p and 

mit.. tnm mt. +m 
for all i > 1. 

For n < 5, these two orders coincide. They differ first for n = 6, where 311° and 
2 are incomparable in the n.p.o. (though the first precedes the second in the r.l.o.). 
However, it is always true that r.l.o. is a linear extension (see Section 12.2) of n.p.o.: 


(13.1.2) Proposition. If À < p, then À precedes p in the reverse lexicographic order. 


Proor. With the notation as before, choose i such that n; = m; for 7 < i but 
n; # Mmi Since nyi +... + ni > m+... + mi we must have n; > mi. 


Conjugation reverses the n.p.o.: 
(13.1.3) Proposition. If A < u then p" < A*. 


Proor. Suppose that u* £ à*, where p* is the partition n = nj + n3 +... , etc. (so 
that n? is the number of j such that n; > i, by definition of conjugation). Then, for 
some 7, we have 
mi +.. tm nit... +n; forz <i 
and mi+...t¢mi<nj+...+7j, 

sot=mi<nj=s. 

Now n3,, + ni,9+... is the number of cells in the diagram of A which lie to the 
right of the ¿t? column; so 


3 
Nyt t... = Lin; — i). 
j=1 
Similarly, ; 
mi Myt- = Dim; — 2). 


j=l 


t 3 t 
Dm — 1) > Dy- t) 2 El; i), 
j=1 j=l j=l 

the right-hand inequality holding because s > ¢ and n; > i for j < s. Hence 


mi +... +m > nit... +H 


and so À £ p. 
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Now we turn to permutations. We saw in Section 3.5 that any permutation of 
{1,...,7%} can be expressed as the product of disjoint cycles, uniquely up to the 
order of the factors and the starting point of each cycle. If the cycle lengths are 
Nieee Nk then n = ni +... + np and so we have a partition of n, which is called 
the cycle structure of the permutation. Thus, cycle structure defines a map from the 
symmetric group Sn to the set of partitions of n. 

Two permutations 91,92 E Sn are said to be conjugate if g2 = h-'g,h for some 
h € Sa. Conjugacy is an equivalence relation on Sp, whose equivalence classes are 
called conjugacy classes.” 


(13.1.4) Proposition. Two permutations have the same cycle structure if and only if 
they are conjugate. 


PROOF. Suppose that g} = h-'gyh. Let (x, zz ... te) be a cycle of g), so that 
tigi = tig, fori =1,...,4 — 1, and zgi = T1. Let y; = qz;h fori = 1,..., k. Then, 
fori = 1,...,k — 1, we have 

yiga = yh gh = sigih = tih = Yiri, 
and similarly ygo = y1. Thus, (yı y2 ... ye) is a cycle of g}. Thus, we obtain the 
cycle decompositon of g, from that of g, by replacing each point by its image under 
h. So the cycle structures are equal. 

Conversely, let gı and gz have the same cycle structure. Calculate the cycle 
decomposition of each, and write that of gz under that of gı so that cycles of 
the same length correspond vertically. Now let h be the permutation obtained by 
mapping each point in the decomposition of g, to the point vertically below it. (So, 
if we forget all the brackets, what is written down is the two-line form of h.} Then 
h-1g,h = go, by the same calculation as before. 

For example, if g, = (1 2 3)(4 5)(6) and go = (2 5 3)(4 6)(1), then g3 = A-'g1A, 
where h = G z 3458) = (1 2 5 6)(3)(4) (in cycle notation). 

It is clear that every partition of n is realised as the cycle structure of some 
permutation; so 

the number of conjugacy classes in Sn is p(n). 


But we can do better, and calculate the conjugacy class sizes: 


(13.1.5) Proposition. Let à = 12% ...n%" be a partition of n. Then the number of 
permutations with cycle structure À is® 
n! 
Mi istay! ' 

Proor. If we write out the brackets for the cycle decomposition of such a permuta- 
tion, there are n! ways of entering the numbers 1,...,7 into the spaces. But we can 
start each of the a; cycles of length i in any position in the cycle, in i% ways, and 
permute these cycles arbitrarily, in a;! ways, for each i; so we have to divide n! by 
the product of all these numbers. 


5 Conjugacy is an equivalence relation in any group. (Prove this.) 
6 In this expression, i has its usual mathematical meaning. 
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13.2. Euler’s Pentagonal Numbers Theorem 


A pentagonal number is a number of the form k(3k — 1)/2 or k(3k + 1)/2 for some 
non-negative number k. Alternatively, it is a number of the form (3k — 1)/2 for 
some (positive, negative, or zero) integer k. The second description is preferable, 
since it generates zero once only, whereas the first produces zero twice. The reason 
for the name is shown by the pictures of pentagonal numbers for small positive k. 


Fig. 13.2. Small pentagonal numbers 


The next theorem, due to Euler, is quite unexpected, as is its application: it will 
enable us to derive an efficient recurrence relation for the partition numbers. 


(13.2.1) Euler’s Pentagonal Numbers Theorem 
(a) Ifn is not a pentagonal number, then the numbers of partitions 
of n into an even and an odd number of distinct parts are equal. 
(b) Ifn = k(3k—1)/2 for some k € Z, then the number of partitions 
of n into an even number of distinct parts exceeds the number 
of partitions into an odd number of distinct parts by one if k 
is even, and vice versa if k is odd. 


For example, if there are four partitions of n = 6 into distinct parts, viz. 
6=541=44+2=3+42+41, two of each parity; while if n = 7, there are five such 
partitions, viz. 7=61+1=5+2=443=4+42+1, three with an even and two 
with an odd number of parts. 


Proor. To demonstrate Euler’s Theorem, we try to produce a bijection between 
partitions with an even and an odd number of distinct parts; we succeed unless n is 
a pentagonal number, in which case a unique partition is left out. 
Let À be any partition of n into distinct parts. We define two subsets of the 
diagram D(A) as follows: 
è The base is the bottom row of the diagram (the smallest part). 
è The slope is the set of cells starting at the east end of the top row and proceeding 
in a south-westerly direction for as long as possible. 
Note that any cell in the slope is the last in its row, since the row lengths are all 
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distinct. See Fig. 13.3. 


Fig. 13.3. Base and slope 


Now we divide the set of partitions of n with distinct parts into three classes, as 
follows: 

ə» Class i consists of the partitions for which either the base is longer than the 

slope and they don’t intersect, or the base exceeds the slope by at least 2; 

» Class 2 consists of the partitions for which either the slope is at least as long as 
the base and they don’t intersect, or the slope is strictly longer than the base; 
o Class 3 consists of all other partitions with distinct parts. 

Given a partition À in Class 1, we create a new partition à by removing the 
slope of \’ and installing it as a new base, to the south of the existing diagram. 
In other words, if the slope of À contains k cells, we remove one from each of the 
largest k parts, and add a new (smallest) part of size k. This is a legal partition with 
all parts distinct. Moreover, the base of A’ is the slope of 4, while the slope of X is 
at least as large as the slope of 4, and strictly larger if it meets the base. So X is in 
Class 2. 

In the other direction, let \’ be in Class 2. We define A by removing the base of 
X and installing it as a new slope. Again, we have a partition with all parts distinct, 
and it lies in Class 1. (If the base and slope of À meet, the base is one greater 
than the second-last row of 4’, which is itself greater than the base of 4’, which has 
become the slope of A. If they don’t meet, the argument is similar.) 

The partition shown in Fig. 13.3 is in Class 2; the corresponding Class 1 partition 
is shown in Fig. 13.4. 


Fig. 13.4. A Class 1 partition 


These bijections are mutually inverse. Thus, the numbers of Class 1 and Class 2 
partitions are equal. Moreover, these bijections change the number of parts by 1, 
and hence change its parity. So, in the union of Classes 1 and 2, the numbers of 
partitions with even and odd numbers of parts are equal. 

Now we turn to Class 3. A partition in this class has the property that its 
base and slope intersect, and either their lengths are equal, or the base exceeds the 
slope by 1. So, if there are k parts, then n = k? + k(k — 1)/2 = &(3k — 1)/2 or 
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n= k(k+1) + k(k - 1)/2 = k(3k + 1)/2. Fig. 13.5 shows the two possibilities. 


Fig. 13.5. Two Class 3 partitions 


So, if n is not pentagonal, then Class 3 is empty; and, if n = k(3k — 1)/2, for 
some k € Z, then it contains a single partition with |k| parts. Euler’s Theorem 
follows. 


(18.2.2) Corollary. [](1— 2") = D (=1) tek 


n>1 k=-00 


Proor. By Euler's Pentagonal Numbers Theorem, the right-hand side is the gen- 
erating function for even(n) — odd(n), where even(n) and odd(n) are the numbers 
of partitions having all parts distinct and having an even or odd number of parts 
respectively. We must show that the same is true for the left-hand side. 

The coefficient of t” is made up of contributions from factors (1 — ¢”),...,(1— 
i), where ni +... +7, = n and ni,... ng are distinct; the contribution from this 
choice of factors is (—1)*. So each term counted by even(n) contributes 1, and each 
term counted by odd(n) contributes —1. So the theorem is proved. 


The right-hand side can be written as 


1+ D-1) (p972 + eR(9h+1)/2) ; 
k>o 


using the first ‘definition’ of the pentagonal numbers. From this, we deduce the 
promised recurrence for the partition numbers. This illustrates the general principle 
that finding a linear recurrence relation for a sequence is equivalent to finding the 
inverse of its generating function (see Chapter 4, Exercise 12). 


(13.2.3) Corollary. For n > 0, 


p(n) = $ (-1) (p(n — Gk (3k — 1)) + p(n — £43k + 1))) 


k>0 
= p(n — 1) + p(n — 2) — p(n — 5) — p(n — 1) + plin — 12) +... , 


with the convention that p(n} = 0 for n < 0. 


PROOF. Since 


Y pny” = a-e, 


n>0 n>0 


we have 


(= roe) . ( + Dea p aos) =1. 


n>0 k>0 
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For n > 0, the coefficient of t” in the product is zero. Thus, 
0 = p(n) + $ (-1) (pln — Fh(3k — 1) + p(n — 3k(3k + 1))), 
k>0 
from which the result follows. 

This is a linear recurrence relation in which the number of terms grows with n, 
but relatively slowly: there are about ,/8n/3 pentagonal numbers below n. Thus, 
it permits efficient calculation: p(n) can be evaluated with O(n*/?) additions or 
subtractions. 


13.3. Project: Jacobi’s Identity 


In this section, I give a delightful proof, due to Richard Borcherds, of an identity of 
Jacobi.® The proof has the appearance of physics, although it is pure combinatorics; 
it involves double-counting states of Dirac electrons! 


Jacobi’s Identity asserts: 


(18.3.1) Jacobi’s Triple Product Identity 


jja +ga) + gz ya _ 7") = Yr. 


n>0 T20 


It is an identity between formal power series in the indeterminates q and z. By replacing g by 
q'/? and moving the third term in the product to the right-hand side, the identity takes the form 


Ha +P PA + gray hy — (= erst (He _ ev) (*), 


n>o 120 n>0 


in which form we will prove it. 
A level is a number of the form n + 4, where n is an integer. A state is a set of levels 


which contains all but finitely many negative levels and only finitely many positive levels. The state 
consisting of all the negative levels and no positive ones is called the vacuum. Given a state S, we 
define the energy of 5 to be 


Sathb > 0,2E S}- SUE < ig SI, 
while the particle number of S is 
[{t:1>0,2E S}] — {tb < 0,8 ¢ S). 


Although it is not necessary for the proof, a word about the background is in order! 

Dirac showed that relativistic electrons could have negative as well as positive energy. Since they 
jump to a level of lower energy if possible, Dirac hypothesised that, in a vacuum, all the negative 
energy levels are occupied. Since electrons obey the exclusion principle, this prevents further electrons 
from occupying these states. Electrons in negative levels are not detectable. If an electron gains 
enough energy to jump to a positive level, then it becomes ‘visible’; and the ‘hole’ it leaves behind 
behaves like a particle with the same mass but opposite charge to an electron, (A few years later, 
positrons were discovered filling these specifications.) If the vacuum has no net particles and sero 
energy, then the energy and particle number of any state should be relative to the vacuum, giving 


tise to the definitions given. 


6 Jacobi’s Identity implies Euler’s Pentagonal Numbers Theorem: see Exercise 10. 
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We show that the coefficient of g”z! on either side of (+) is equal to the number of states with 
energy m and particle number l. This will prove the identity. 

For the left-hand side this is straightforward. A term in the expansion of the product is obtained 
by selecting q"-tz or g*~ 427! from finitely many factors. These correspond to the presence of an 


electron in positive level n — i (contributing n — l to the energy and 1 to the particle number), or 


a hole in negative level —(n — 4) (contributing n — } to the energy and —1 to the particle number). 
So the coefficient of gz! is as claimed. 

The right-hand side is a little harder. Consider first the states with particle number 0, Any such 
state can be obtained in a unique way from the vacuum by moving the electrons in the top k negative 
levels up by n1,72,..., Rk, say, Where ny > ng >... > ng. (The monotonicity is equivalent to the 
requirement that no electron jumps over another.) The energy of the state is thus m = ny +... +n. 
Thus, the number of states with energy m and particle number 0 is equal to the number p(m) of 
partitions of m, which is the coefficient of g” in I(¢) = J[nso(1 — ¢7)7?, by (13.1.1). 

Now consider states with positive particle number i, There is a unique ground state, in which 
all negative levels and the first i positive levels are filled; its energy is }+3+...+ 25! = 37, 
and its particle number is l. Any other state with particle number [ is obtained from this one by 
Jumping’ electrons,up as before; so the number of such states with energy m is p(m— ir), which is 


the coefficient of gz! in q”/22'II(q), as required. 
The argument for negative particle number is similar. 
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Our definition of a tableau is not the most general one possible; what is defined 
here is usually called a standard tableau, but I will not talk about any non-standard 
tableaux!” 

Let À be a partition of n, with diagram D(A). A tableau, or Young tableau, with 
shape À, is an assignment of the numbers 1,2,...,7 to the cells of D(A), in such a 
way that the numbers in any row or column are strictly increasing. For example, 
the three tableaux with shape 3'1' are shown in Fig. 13.6. 


Fig. 13.6. Tableaux 


The number of tableaux with shape À is denoted by fy. Clearly, we have 
fa = fre, the corresponding tableaux being related by transposition. 

There is a somewhat unexpected formula for f,. Given a cell (i,j) of the 
diagram D(A), the hook H(i,7) associated with it is the set consisting of this cell 
and all those cells to the south or east of it; that is, all cells (ż¿, j’) in the diagram 
with j’ > j, and all cells (¢',j) with 7’ > i. The hook length h(i, 7) is the number of 
cells in the hook H(i, 7). 


n! 


(13.4.1) Theorem. f, = ——— 
TMepjeoey h(i, j) 


7 Plural of tableau. 
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In his book Symmetric Functions and Hall Polynomials, lan Macdonald refers 
on p. 53 to ‘The fact that the number of standard tableaux of shape A is equal to 
ni/h(A), and says ‘No direct combinatorial proof seems to be known.’ The note 
refers to a proof of this hook length formula at the end of a series of exercises, 
quoting earlier results on symmetric functions. I do not plan to trace through the 
argument here! 


The numbers fy have another combinatorial interpretation. Let À be the partition 
n =ni +... +np, where (as usual) ny >... > ng. Suppose that, in an election, n 
voters cast their votes for k candidates, with the i'* candidate receiving n; votes for 
i=1,...,&. Then the number of ways in which the votes can be counted, so that 
at no stage in the count is the j'" candidate ahead of the ith, for any j > i, is fx 
To see this, record the count by writing the numbers 1,...,n in the cells of D(A), 
where m is put in the itè row (immediately to the right of the entries already there) 
if the m'” vote goes to the i** candidate. By assumption, we have a tableau with 
shape \; and every tableau corresponds to a possible count. 

In particular, if À is the partition 2n = n +n, then fy is the Catalan number 
Cr41 — this interpretation of fy is in exact agreement with that for the Catalan 
number given in Exercise 15(b) of Chapter 4. So the numbers f, generalise the 
Catalan numbers. We can check the hook length formula (13.4.1) in this case. The 
hook lengths for this partition A are n+1,n,...,2in the first row, andn,n—1,...,1 
in the second; so 


om = ="), 


n 


in agreement with (4.5.2). 


Another important property of tableaux is the Robinson-Schensted-Knuth cor- 
respondence: 


(13.4.2) Robinson-Schensted-Knuth correspondence. There is a bijection between the 
set of permutations of {1,...,n}, and the set of ordered pairs of tableaux of the 
same shape. Under this bijection, if g corresponds to the pair (S,T) of tableaux, 
then g7! corresponds to (T,5). In particular, the two tableaux corresponding to a 
permutation g € 5, are equal if and only if g? = 1. 


Proof. We give a constructive proof, of course! We build a pair (S, T) of tableaux 
from a permutation g, which we take in passive form (a1,... , €n). The construction 
proceeds step by step. Before the first step, S and T are empty. At the start of the 
iù step, S and T are ‘partial tableaux’ with i cells, having the same shape. (This 
means that their entries are distinct but not necessarily the first i natural numbers, 
and the rows and columns are strictly increasing. In fact, T is a genuine standard 
tableau, but S is not in general.) In step i, we add a new cell to the shape, and add 
entries a; to S and i to T, in a manner to be described. The procedure is recursive; 
we define a ‘subroutine’ called INSERT, which puts an integer a in the j"* row of a 
partial tableau T. 
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Subroutine: INSERT a into the j row 
If a is greater than the last element of the j™ row, then append it 
to this row. (If the j‘* row is empty, put a in the first position.) 
Otherwise, let x be the smallest element of the j'è row for which 
a # z. ‘Bump’ z out of the j™ row, replacing it with a; then INSERT 
zx into the (j + 1)* row. 


Now we can give a complete specification of the RSK algorithm: 


RSK algorithm 
Start with S and T empty. 
Fori=1,...,n, do the following: 

e INSERT a; into the first row of S. This causes a cascade of 
‘bumps’, ending with a new cell being created and a number 
(not exceeding a;) written into it. 

e Now create a new cell in the same position in T and write i 
into it. 


We have to check that, after the i‘ stage, S and T are partial tableaux. The fact 
that rows and columns are increasing is, for S, a consequence of the way INSERT 
works; for T, it is because i is greater than any element previously in the tableau. 
The point of substance is that the newly created cell doesn’t violate the condition 
that the row lengths are non-increasing; that is, there should be a cell immediately 
above it. This is because the element ‘bumped’ is smaller than the element to the 
tight of the position it is ‘bumped’ out of, and so it comes to rest to the left of this 
position. 

At the end of the algorithm, we have two tableaux of the same shape. 


We illustrate the algorithm with the permutation (2, 3, 1). 


Stage 1 Stage 2 


Fig. 13.7. The RSK algorithm 
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At stage 3, 2 is ‘bumped’ by 1 into the second row. 


The procedure can be reversed, to construct a permutation from a pair of 
standard tableaux of the same shape. To see this, note that we can locate n in 
the tableau T, and then reconstruct the cascade of ‘bumps’ required to move the 
corresponding element of § to that position; the insertion triggering this cascade 
is a,. Working back in the same way, we recover the entire permutation. (A few 
worked examples make this clearer than pages of explanation!) 


Now we come to the final claim that, if (S,7) corresponds to g, then (T, S) 
corresponds to g~'. My argument here will be somewhat ‘hand-waving’. Let g and 
g`! have passive forms (a1,...,@,) and (5,,...,8,) respectively. Thus, a; = 7 if and 
only if b; = i. For the permutation g, stage 7 in the construction inserts a; into S and 
i into T; a; goes into the first row, and i into a position determined by a cascade 
of ‘bumps’ in S. Subsequently, i keeps its place in T, but a; may be ‘bumped’ down 
by subsequent insertions corresponding to values of s with s > z but a, < a;. Each 
‘bump’ moves it down one row. 

Now, corresponding to g7}, at stage j, we insert b; = i into the first row of 5, 
and 7 into T, in a position determined by a sequence of bumps in §. One can check 
that these are the same bumps that moved a; before, but all in a single cascade 
rather than one at a time. Dually, the bumps which subsequently move 6; down 
are those which determine the position of 2 in the previous case. So the resulting 
tableaux S and T are precisely the T and S corresponding to g, and the claim is 
proved. 


(13.4.3) Corollary. (a) P(A)? =n! . 
Abn 
(b) >> fy = s(n), where s(n) is the number of solutions of g? = 1 in Sy. 
AFn 


The function s(n) was considered in Section 4.4, where we proved a lower bound 
for it. We can now re-do this and give an upper bound too, 


(13.4.4) Corollary. Vn! < s(n) < y/p(n)nl. 


Proor. (a) (Z f,)? > E f}, since the right-hand side omits all ‘product’ terms 2f) fp. 
(b) The vectors (1,1,...,1) and (far, fros---> faye) im the Euclidean space of 


dimension p(n) have lengths ,/p(n) and vnl, and inner product s(n). 
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Let 11,...,2y be indeterminates. A polynomial f(z1,...,2N) is called symmetric 
if it is left unchanged by any permutation of its arguments: f(£1p,--- INg) = 
f(zi,..., zw) for all g € Sy. (The older term ‘symmetric functions’ is often used; I 
will avoid this since it has at least two more general meanings.) 

Any symmetric polynomial can be written uniquely as a sum of parts which are 
homogeneous (that is, every term has the same total degree). These homogeneous 
parts are themselves symmetric. So we may restrict our attention to homogeneous 
symmetric polynomials, of degree n, say. 
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We now define some special classes of symmetric polynomials. Let A be the 
partition n = ni +... + 74. 

(a) The basic polynomial m, is the sum of the term z7’... z}” and all the other 
terms which can be obtained from this one by permuting the indeterminates. (If 
some of the parts n; are equal, the same term will come up more than once; but 
each term is only included once.) 

(b) The elementary symmetric polynomial e» is the sum of all products of n 
distinct indeterminates; the complete symmetric polynomial h, is the sum of all 
products of n indeterminates (repetitions allowed); and the power sum polynomial pa 
is tf +...42%- 

(c) If z is one of the symbols e, È or p, then we define zy = Zn; .-- 2n,- 

For example, if there are three indeterminates, and A is the partition 3 = 2+ 1, 
then 

my = ain + zir + z?T3 + zzi + zrs + T3229, 
ex = (£122 + 2123 + T283)(21 + x2 + 23), 

P= (zi + z? + x2) (21 +r: + z3) 

hy = etp. 


(13.5.1) Theorem. For N > n, if z is one of the symbols m,e,h or p, then any 
homogeneous symmetric polynomial f of degree n in %,...,2~ can be written 
uniquely as a linear combination Y} y-n ¢,2). Moreover, in all cases except z = p, if 
f has integer coefficients, then the numbers c) are integers. 


Proor. For z = m, this is clear: if one term of m, occurs in f, then all the other 
terms appear with the same coefficient. 

To show the rest of the theorem, we have to demonstrate that the m, can be 
expressed as linear combinations of the z, (with integer coefficients if z # p). I will 
consider z = e now; the others will emerge naturally later. The key fact is: 


Suppose that e) = Dyin GuMp- Then axx = 1, and a), = 0 unless 
js > à" in the natural partial order. 


For, if À is the partition n = n, +...+ nz, then e, contains the term 


(z182... En, (T1 c Eno) -e (Z1-- Engl 


which occurs in ma»; so axs = 1. Any other monomial in e) corresponds to a 
partition greater than this one. 

Thus, if the e, are ordered according to the reverse lexicographic order, and the 
m) according to the r.l.o. of their duals, then the matrix expressing the es in terms 
of the ms is upper triangular, with diagonal entries 1 and all entries integers. (Recall 
that the r.lo. is a linear extension of the n.p.o.} So it is invertible, and its inverse 
has the same form. (Compare the Möbius inversion algorithm in Section 12.7.) 


(13.5.2) Corollary. Any symmetric polynomial f(x1,...,%n) can be written as a 
polynomial g(z1,...,2n), where z is one of the symbols e, h, p. In the first two cases, 
if f has integer coefficients, then so does g. 
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This holds because the z, are all possible monomials of degree n which can be 
formed from 2,...,2N- 

In the case z = e, this is a version of Newton’s Theorem on symmetric functions. 
The particular significance of this case is that, if a1,...,a@Nn are the roots of the 
polynomial ¢(t) = ¢% + at4-! +... + ay = 0, then 


Q; = (—1Fe:{%, see aN) 


so any symmetric polynomial in the roots of ¢ can be written as a polynomial in 
its coefficients. (Newton’s Theorem extends to larger classes of functions, such as 
rational functions.) 
Further results about symmetric polynomials can be expressed conveniently in 
terms of their generating functions. Define 
E(t) = >> ent”, 
n20 
H(t) = X hat”, 
n>o 
P(t)= >> ppt”. 
not 


(These series of course also involve the indeterminates #1, ...,% nv.) Now we have 


N 


E(t) = Ta + z,t), 


= 
H(t) = J[0 - t, 


r=1 


as is shown by expanding the products on the right in the usual way. In particular: 


(13.5.3) Proposition. (a) H(é) = E(—t)"}. 
(b) S (— 1) erhn-r =0forn>1. 


r=0 


Here (b) comes from expanding E(—t)H(t) = 1. It is a recursive relation 
expressing €n in terms of €0,...,€n-1 and ho,...,Aa By induction, €n can be 
expressed as a polynomial in Ao,...,/n with integer coefficients. This is equivalent 
to the assertion that the polynomials e) are linear combinations of the h, with 
integer coefficients. This proves the case z = h of Theorem 13.5.1. 


The situation for P(t) is a little less obvious: 


(13.5.4) Proposition. (a) SH = P(t)H(t) and Tea) = P(-t)E(t). 


(b) nhn = >> Prha- and nen = SPO(-1) T pren- 
r=1 


t=l 
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PROOF. (a) 


d & 1 
= J 2 gll — zit) 


i=l 
= Í iog H(); 
the argument for the other part is similar. 
(b) comes fiom (a) by expanding and equating coefficients. 
The result of (b) allows us to express en or hn as polynomials in pi,..., Pn, and 
hence e) or A) as linear combinations of the p,. But this time the coefficients are 
rational numbers, not integers, because of the terms nen, nhn in (b). For example, 


e= H(p? — p), ha = H(p? + po). 


There are several further reasons for combinatorialists to be interested in sym- 
metric polynomials. One is the fact that we have the indeterminates £1,..., £N at 
our disposal; substitutions of particular values lead to interesting specialisations. 
For example {taking n = N): 

(a) putting tı =... = z, = 1, we have E(t) = (1 +£)" and e = (") giving the 

Binomial Theorem (3.3.1). Similarly, H(t) = (1 — t)” and A, = (nr), 

(b) Putting z; = g7! for i = 1,...,n, we find that E(t) = []%.,(1 + q*12) is the 
left-hand side of the g-binomial Theorem (9.2.4); so 


n— rr- n 
erl, gg 1) = gr nal ; 
g 


the Gaussian coefficient. 


Secondly, we have now four bases for the space of symmetric polynomials of 
degree n, namely (m)), (ex), (a) and (pa). A further important basis consists of 
the Schur functions s,. The transition matrices between these bases define interesting 
arrays of numbers indexed by pairs of partitions. In many cases, these have 
combinatorial significance, or specialise to more familiar numbers, including the 
numbers fy of standard tableaux (Section 13.4), Stirling and Bell numbers (Sections 
4.5, 5.3), and cycle indices of symmetric and alternating groups (see Section 15.3). 
For algebraists, I mention the fact that the transition matrix from (p,) to (sa) is 
the character table of the symmetric group Sn. See Macdonald, Symmetric Functions 
and Hall Polynmials, for an overview of this material. Reading it, one can appreciate 
the view held by some people, that if it isn’t related to symmetric polynomials, then 
it isn’t combinatorics! 


224 13. More on partitions and permutations 


13.6. Exercises 


1. In the spirit of Section 3.12, devise an algorithm for generating the partitions of 
n, one at a time, in reverse lexicographic order. 


2. Use the recurrence relation (13.2.3) to calculate p(n) for n < 20. 

3. Prove that p(n) < Fa for n 2 5, where F, is the n'* Fibonacci number. 

4, Show that conjugation of partitions does not reverse the r.lo. 

5. Define two operations o, è on partitions as follows. Let A: n = mit... + nk 
prmamt...+m be partitions of n and m respectively; undefined parts are 
zero, Then Ào p and Àe u are the partitions of m+n defined thus: for À o y we add 
the parts of A and p, viz. 


Nop: (n+m) = (m +m) + (m2 +m) +. 


while the parts of À» u are the parts of À and u together (arranged in non-increasing 
order). Prove that . 

(op) =A on". 
6. (a) Prove that, if k > n/2, the number of permutations in S, having a cycle of 


length & is n!/k. 
(b) If t(n) is the proportion of permutations in S, which have a cycle of length 


greater than n/2, show that 
Jim t(n) = log 2. 
T. Let I(t) = Ln>op(r)t” be the generating function for the partition numbers. 
Let o(n) be the sum of the divisors of n, and L(t) = Uno o(n)i"—’ its generating 
function. Prove that d 
—JI(t) = E(t), 
SI) = SHE) 
and deduce that 


n 


np(n) = E o(k)p(n — k). 


k=1 


8. Prove that h,(1,q,.- 71) = po, 


9. Let z; = 1/N for! <: S N, and let N — oo. Show that the limiting values of 
E(t) and H(t) are both equal to e. 
10. Deduce Euler's Pentagonal Numbers Theorem from Jacobi’s Triple Product 
Identity. [HINT: put q = 3/2, 2=-t 7] 
11. Let A be a matrix of zeros and ones, with row sums nı >... > np > 0 and 
column sums mı >... > mų > 0; let À and p be the partitions n = nı +... +n 
and n = mi +... +m Show that the polynomial e, contains a term 7; 1 ap. 
Show further that, if 

é = YO ayy, 

pen 


then ax, is equal to the number of matrices A which satisfy the above conditions. 


14. Automorphism groups and 
permutation groups 


There is transitive motion and there is intransitive motion: the motion of a 
galloping horse is transitive, it passes through our field of vision and continues 
on to wherever it is going; the motion in a tile pattern is intransitive, it moves 
but it stays in our field of vision. 


Russell Hoban, Pilgermann (1983) 


Topics: Permutation groups, automorphism groups; orbits, transi- 
tivity, primitivity, generation 


TECHNIQUES: Group theory 
ALGORITHMS: Schreier—Sims algorithm 


CROSS-REFERENCES: Labelled and unlabelled structures (Chapter 2), 
permutations (Chapter 3), STS(7), [STS(9)] (Chapter 8), Petersen 
graph (Chapter 11), [Möbius function (Chapter 12)], cycle structure 
(Chapters 3, 13, 15) 


Groups perform two main functions in combinatorics, paradoxically opposed. On 
the one hand, they measure order. Any combinatorial object has an automorphism 
group; the larger the group, the more symmetrical the object. On the other, they 
measure disorder, The most familiar example of this is Rubik’s cube, whose possible 
configurations {more than 101%) are the elements of a group, only the identity of 
which corresponds to the completely ordered state. We'll see in this chapter that the 
same basic principles underlie the study of groups in both these roles. 


14.1. Three definitions of a group 


In this section, we'll re-write history a bit, tracing in idealised form the path from 
the definition of a group as ‘all symmetries of an object? to the modern axiomatic 
definition. The point of this journey is to see how the various concepts are related. 

By an object I will mean a pair (X,S), where X is a set, and S any structure on 
X, whose exact nature needn’t be specified: it may be a set of unordered or ordered 
pairs (ie. a graph or digraph), a set of subsets or partitions of X, or something 
more recondite (such as a set of paths of length 3 using vertices of X, or a set of 
weight functions on the edges of the complete graph on X). The point is that, given 
any permutation g on X, there should be a natural way of applying g to S. For 
example, if (X,S) is a graph, we apply g to each edge in S to obtain the edge set 
Sg. If S is a set of sets of ..., we apply this construction recursively. 
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The permutation g of X is an automorphism of (X,S) if Sg = S. The automor- 
phism group of (X,S) is the set Aut(X, S) of all automorphisms of (X, S). A subset 
G of Sym(X) is an automorphism group if G = Aut(X,S) for some structure S on 
X. This is our first ‘definition’ of a group. 


An automorphism group G has the following properties: 
(P1) it contains the identity permutation; 
(P2) it contains the inverse of each of its elements; 
(P3) it contains the composition of each pair of its elements. 
(The first condition is clear. For the second, if S = Sg, we can apply g™' to both 
sides, yielding Sg~! = S. For the third, if Sg = Sh = S, then S(gh) = (Sg)h = S.) 

These facts form the basis of our second definition. A set G of permutations of 
X is a permutation group on X if it satisfies (P1), (P2) and (P3). We observed that 
every automorphism group is a permutation group; is the converse true, or have we 
strictly enlarged the domain of groups? 

It turns out that, indeed, every permutation group is the automorphism group 
of some object. (See Exercise 1 for a proof.) 

However, this is not the end of the story. Not every permutation group is the 
automorphism group of a graph, for example. (There are just two different graphs on 
the vertex set {1,2}, and both have two automorphisms. So the permutation group 
on this set which contains only the identity permutation is not the automorphism 
group of any graph. Note that the construction of Exercise 1 shows that it is the 
automorphism group of the digraph with edge (1,2).) The problem of deciding 
which permutation groups are automorphism groups of graphs is unsolved. 


The next step is in the spirit of nineteenth-century axiomatic mathematics. It 
was decided that the important thing about a group is the operation of composition. 
In terms of this, for example, we can characterise the identity permutation e by the 
fact that eg = ge = g for all permutations g, and the inverse g`! of a permutation 
g by gg-' = 971g = e. Let us temporarily write the composition of g and has goh. 
Now a permutation group G satisfies the following conditions: 

(A1) Associativity: go (ho k) = {g o h) o k for all g, h,k E€ G; 

(A2) Identity: there exists e € G with eog =goe for all g € G; 

(A3) Inverses: for any g € G, there exists g`! € G with gog =g 10g =e. 
Associativity is a general property of composition of functions: 


z(g o (h o k)) = (xg)(hok) = ((zg)b)k = (2(g o h))k = z((g 0h) o k). 


We observed that the identity and inverse permutations have the required properties, 
and they are contained in G by (P2) and (P3). 

Cayley defined an abstract group to be a set G with a binary operation o defined 
on it satisfying (A1), (A2) and (A3). Thus, every permutation group is an abstract 
group. Again, we must ask whether the converse is true. The fact that it is, is the 
content (and the raison d’étre) of Cayley’s Theorem: 


(14.1.1) Cayley’s Theorem. Every abstract group is isomorphic to a permutation 
group. . 


14.1. Three definitions of a group 227 


Proor. We are given an abstract group G, with operation o, and are required 
to find a permutation group G’ on a set X, whose elements are in one-to-one 
correspondence with those of G, such that the element of G’ corresponding to g o h 
is the composition of the elements corresponding to g and h. 

We take X = G, and let G’ = {pp : g E€ G}, where p, is the right translation by 


zp,=ro0g forall z,g EG. 


It isn’t clear yet that p, is a permutation; but at least p, # ph for g # h (consider 
their effect on the element e), so that we have a one-to-one correspondence. Now 
we have 


Lpgpr = (ZOg)9h=20(goh) = £Pgoh, 


so the group operation in G corresponds to composition. From this, conditions 
(P1)-(P3) follow: closure is obvious (9,9, = gor); pe is the identity permutation; 
and p,-1 is the inverse mapping to pg (from which it follows that p, is indeed a 
permutation). 


It follows of course that every abstract group is an automorphism group, so the 
three concepts are identical. More is true. Frucht showed that every abstract group 
is the automorphism group of a graph. (In Section 14.7, we outline a proof of this.) 
Frucht showed further that in fact this graph can be taken to be trivalent. A sheaf 
of similar results is known. 


From now on, we abbreviate ‘abstract group’ to ‘group’, and represent the group 
operation by juxtaposition gh instead of g o h. Most accounts now go much further, 
hiding the origins of the concept by reversing the procedure. A group is defined by 
axioms (A1)-(A3);! Cayley’s Theorem shows that it makes sense to represent groups 
by means of permutations in order to study them (nothing is lost by this). Of course, 
the definition of a permutation group then changes: it is a set of permutations 
which, equipped with the operation of composition, forms a group! 

We need one more concept. This is because the construction in Cayley’s Theorem 
isn’t the only way in which a group can be represented by permutations. So we 
define an action of a group G on a set X to be a map 8 from G to the set Sym(X) 
of permutations of X, satisfying 


(gh) = (96)(A8), 
10 =1, 
g 18 = (98), 
where we used the same notation for group operations and permutations (juxtaposi- 


tion, 1, and ~'). In fact, the second and third conditions follow from the first, which 
says that ô is a hamomorphism from G into Sym(X). 


1 Often ‘closure’ is given as an axiom. Since a binary operation is defined on all pairs, this is not 
necessary; it is a historical vestige, or ontogeny repeating phylogeny. 
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The same group can have many different actions. We need to be able to say 
when two actions are ‘the same’. Let 8, ¢ be actions of G on sets X,Y. We call these 
actions equivalent if there is a bijection f : X — Y such that 


(zf)(9¢) = (2(98))f 


for all x € X, g € G. In other words, if we use f to identify the sets X and Y, then 
any element of G induces the same permutation on the two sets. 

For an example, let G be the symmetric group S3, regarded as the automorphism 
group of a triangle (Fig. 14.1). Then G acts on the vertices and on the edges of the 


1 
Fig. 14.1. A triangle 


triangle. These actions are equivalent by means of the map f, where 1f = {2,3}, 
2f = {3,1}, 3f = {1,2}. (For example, if a permutation g carries 1 to 2, then it 
carries {2,3} to {3,1}.) 


14.2. Examples of groups 


Perhaps the most famous groups are the cyclic groups Cn. The group Cn can 
be regarded as the additive group of congruence classes modulo n, or as the 
multiplicative group of all n'è roots of unity in C (that is, {e ; k = 0,...,n—1}), 
or (for n > 2) as the automorphism group of the cyclic digraph with vertex set 
{0,1,... n — 1} and edge set 


{(i,i +1): i= 0,... 0 = 2}U {(n — 1,0)}. 


Algebraically, an important fact is that it is generated by a single element g, that is, 
all its elements are powers of g. Any finite group with this property is cyclic. 


(We say that a group G is generated by a set S of elements if each member of g 
can be expressed as a product of elements of $ and their inverses, in any order and 
allowing repetitions. This is logically equivalent to saying that S is not contained in 
any proper subgroup of G, but expresses the concept in a more positive way. More 
generally, if S is a subset of a group G, the subgroup H generated by S consists 
of all products of elements of S$ and their inverses; it is also characterised as the 
smallest subgroup of G containing S, that is, the intersection of all subgroups of G 
containing S. Since every subset of Sym(X) generates some permutation group, we 
have a potentially enormous collection of groups; but it is quite difficult to deduce 
properties of the group from a generating set. We will consider this problem in 
Section 14.4.) 


A closely related group is the dthedral group Do, of order 2n. For n > 3, 
Don is the automorphism group of the cyclic (undirected) graph with vertex set 
{0,1,... n — 1} and edge set 
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{{i,i +1} :6=0,...,n—2}U {{n—1,0}}, 


or the group of symmetries of a regular n-gon. It contains the cyclic group C,, as a 
subgroup (the rotations of the n-gon). The remaining elements are reflections of the 
n-gon in its n axes of symmetry. (If n is odd, all axes of symmetry are alike; but, 
if n is even, there are two types, one joining opposite vertices and the other joining 
midpoints of opposite edges.) The dihedral groups can be defined consistently for 
smaller n: Dz is the cyclic group of order 2 (generated by one reflection), and D, 
is the Klein group V, = {1,a, b,c}, where a? = P = œ = 1, ab = c, be = q, ca = b. 
Note that V; is the group of symmetries of a rectangle. 


We have already met the symmetric group Sym(X), consisting of all permutations 
of X. If |X| = n, it is also denoted by S,,, and its order is n! . We saw in Chapter 5 
that, for n > 1, S, has a subgroup of order n!/2 consisting of the even permutations 
of X, called the alternating group and denoted by Alt(X) or A,. We see that Sz is 
the cyclic group C2, while A3 and S; are isomorphic to C3 and Dg respectively. 


We met briefly the general linear group GL(n, gq) consisting of all invertible n x n 
matrices over GF(q) in Chapter 9, where we calculated its order. 


Groups can be built up from smaller ones. Two important constructions are the 
direct product and wreath product, which we now define. 

Let G and H be permutation groups on sets X and Y respectively. We assume 
that X and Y are disjoint. The direct product G x H consists of all ordered pairs 
(g, h) with g € G and h € H, and acts on the disjoint union X UY in the following 
way: 


—~Jzg ifz€X; 
(g,h) = { 7 ifzcY. 


(You should check that this is an action.) The group operation is given by 


(gi, 21) (g2, h2) = (giga, hiha). 


The action of G x H on X UY is called its natural action. Another action is its 
product action on X x Y, defined by 


(z,y)(9,4) = (2g, yh). 


The wreath product G wr H is more difficult to define abstractly; I will describe 
it as a permutation group. Its natural action is on the set X x Y; but we take 
Y = {y1,...,yn}, and regard X x Y as the disjoint union of n copies X),...,X, of 
X, where X; = X x {y;}. Now we define two permutation groups: 

o» The bottom group B is the direct product of n copies of G, in its natural action 
on X,U...U Xn In other words, B acts by the rule 


(2, Yi )(G12--+ 49a) = (E9 Yi). 


© The top group T consists of H acting on the second coordinate: 


(2, yh = (z, yh). 
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In other words, T shifts the sets X,,...,X;,, around bodily. 
Now the wreath product G wr H is the group generated by B and T (and consists 
f all products bt for b € B, t € T). , 
° There is another action of the wreath product, the product actiom on the set 
XY of all functions from Y to X. We can regard a function f € XY as an n-tuple 
(f(y1),--+»f(yn)) of elements of X. Now the base group acts by 


Figs- 19n) = (Fin) if (yn)9n), 


(in other words, the image of f under (g1,...,9n) is the function f where f (yi) = 
f(yi)gi); and the top group acts by the rule that fh is the function f’, where 
f'(yi) = Fh) wich and 

Puzzles like Rubik’s cube give rise to groups, which are most easily Of 
giving sets of generators. As an example, easier than Rubik’s cube, I will describe 
Rubik's domino. This puzzle appears from the outside as a 3 x3 x 2 rectangular 
parallelepiped, divided into 18 unit cubes. In the starting position, the nine cubes in 
one 3 x 3 face are coloured white, and those in the other square face are black; each 
cube carries a number of spots of the other colour between 1 and 9, so that on the 
white face the arrangement is as shown in Fig. 14.2, and each black cube has the 


escribed by 


Fig. 14.2. Rubik’s domino 


i be with which it shares a face (giving the mirror image 
same number a) ail label the white cubes with capital letters from A to Í, 
and the black cubes with the corresponding lower-case letters a to 7. 

A move consists of a rotation of a face of the parallelepiped. The square faces 
can be rotated through 90°, 108° or 270°, while the rectangular faces can only be 
rotated through 180°. Thus, moves correspond to powers of the six permutations 

(ACIG)(B FH D) 
(acig)(f hd) 
(A c)(C a)(B 8) 
(CHIME f) 
(I 9)(G UH h) 
(G a)(A g)(D d) 

The domino group is the group of all permutations of the cubes which can be 
produced by applying a sequence of moves. It is the group generated by the above 
permutations; but, to see this, we must resolve one difficulty. 


14.3. Orbits and transitivity 231 


The permutations listed above correspond to applying basic moves to the domino 
in its ordered state. However, if it is disordered, different permutations result because 
the cubes which are moved have different letters! A move can be regarded as a 
fixed place-permutation, or permutation of the positions; but we have represented 
states of the domino as entry-permutations, or permutations of the cubes. We must 
examine the distinction. 

Let g be a permutation of {1,...,n}. In two-line form, it is 


_f1 2 1... R ) 
$—\1g 2g ... ngj’ 
If we compose g with the entry-permutation h, then the entry in position t, which is 
ig, is replaced by its image under h, which is igh; the result is 


1 2... n ) 
lgh 2gh ... ngh/? 
which is our usual composition of permutations. But if we compose g with the 
place-permutation h, then the entry ig in position 7 is carried to position ih; the 


Iti 
—_ Ih 2h o nb) (Lo gg ) 
lg 29 ... ng) \la'g 2hg ... nh™gj’ 
so the effect is to compose the inverse of h with g. In particular, choosing g to be the 
identity, we see that the place-permutation h corresponds to the entry-permutation 


h-!. So the rule for composing place-permutations is: compose the corresponding 
entry-permutations from right to left. 


In particular, the group generated by a set of permutations is the same, whether 
they are place-permutations or entry-permutations. Thus, the domino group is 
indeed generated by the six permutations displayed earlier. 
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If a group G acts on a set X, then as combinatorialists we are mainly interested in 
X rather than G; we want to know what structures on X are left invariant by G, for 
example. The action @ is a homomorphism from G to the symmetric group on x A 
and its image is a permutation group. So we lose little by considering permutation 
groups rather than abstract groups. (An algebraist, on the other hand, is more 
concerned with G, and observes that the homomorphism has a kernel N, a normal 
subgroup of G which measures exactly what is lost in passing to the permutation 
group G8.) 

In any case, from now on, G will be either a permutation group on Xora 
group acting on X; I will suppress the map @ in the notation, and write xg for the 
image of z under (the permutation corresponding to) g. 


Our first target is a generalisation of the cycle decomposition of a single permu- 
tation (Chapter 3). Let G act on X. Define a relation = on X by the rule 


z=y ifandonlyif zg =y for some g CG. 


(14.3.1) Proposition. = is an equivalence relation. 
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Proor. There is a kind of historical inevitability about this result; most naturally- 
occurring equivalence relations in mathematics arise from group actions. The 
three axioms for an equivalence relation (reflexivity, symmetry and transitivity) 
are immediate consequences of the three axioms for a permutation group (identity, 
inverses, and closure under composition). To take the second as an example: suppose 
that z = y. Then zg = y for some g € G; so yg"! = =, and y = z. 


The equivalence classes of the relation = are known as the orbits of the group 
G. So we have, uniquely, a partition of X into orbits. G is said to be transitive if 
there is only one orbit, intransitive otherwise.” Note that, for intransitive G, we have 
an action of G on each orbit, and these actions are transitive. So, if we want to 
describe all the ways in which a group can act on a set, it suffices to describe the 
transitive actions.” 


EXAMPLE. The orbits of the domino group are 
{A,C,I,G,a,c,i,g} (corner cubes); 
{B, F, H, D,b, f,h,d} (edge cubes); 

{E} (white centre cube); 
{e} (black centre cube). 


To describe all the transitive actions, we introduce first a special class of these, 
the coset actions, We show that any transitive action is equivalent to a coset action, 
and we decide when two coset actions are themselves equivalent. 

Let H be a subgroup of the group G. A right coset of H in G is a set of the 
form Hg = {hg : h € H} for some fixed g € G. We need the fact that any two 
cosets are equal or disjoint. (This is the core of Legrange’s Theorem.) For this we 
first show 

if g' € Hg, then Hg' = Hg. 

For, if g' € Hg, then g' = hog for some ho € H; then any element hg’ € Hg’ lies in 
hg because hg’ = (hho)g and kho € H. Similarly, every element of Hg’ is in Hg. 

Now suppose that cosets Hg, Hg' are not disjoint; let g' € Hg N Hg’. Then 
Hg = Hg’ = Hog’, as required. 


Lagrange’s Theorem says that the order of a subgroup H of G divides the order 
of G. This now follows from the fact that a coset of H has the same number of 
elements as H itself. (The map h + hg is a bijection from H to Hg.) We see that 
the number of cosets of H is equal to |G|/|H|. (This number is called the index of 
H in G.) 


The coset space (G : H) is the set of right cosets of H in G. (It is often denoted 
by H\G, but this is easily confused with the set difference H \ G.) Now the coset 
action of G on (G : H) is given by the rule 


(Hk)g = H(kg). 


? This is not the same as the distinction between transitive and intransitive motion made so 
eloquently by Russell Hoban in the quote at the head of this chapter. Hoban’s dichotomy is closer 
to the difference between active and passive forms of a permutation. 

3 The algebraist’s job is harder. An intransitive permutation group is contained in the direct product 
of the transitive permutation groups induced on the orbits, but need not be the whole direct product. 
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In other words, the permutation corresponding to g maps Hk to Hkg for all k € G. 
It is easily verified that this is indeed an action. 


As promised, we have the following two results. 
(14.3.2) Proposition. Any transitive action of G is equivalent to a coset action. 


Proor. Let G act transitively on the set X. Choose a point x € X, and let 
H = {g € G : zg = z}. Then H is called the stabiliser of z, and is written G, or 
Stabg(z). We have, by an easy check, 

» H is a subgroup of G. 
Also (and this is the heart of the matter), 

è there is a natural bijection between X and (G : H). 
The bijection is defined as follows. To each point y € X corresponds the subset 
S(y) = {g € G : zg = y}. The set S(y) is non-empty, by transitivity of G. The sets 
S(y) (for y € X) form a partition of G, and it is straightforward to identify it with 
the partition into cosets of H. Finally, 

© this bijection defines an equivalence of the actions of G.. 
In other words, if yg = z, then S(y)g = S(z); this follows from the definitions. 


(14.3.3) Proposition. Two coset actions on (G : H) and (G : K) are equivalent if 
and only if the subgroups H and K are conjugate. 


Proor. H and K are conjugate if K = g] Hg, for some g, € G. If this holds, then 
the map Kg + Hgg is an equivalence. Conversely, suppose that actions on the 
coset spaces of subgroups H and K are equivalent. Let K correspond to the coset 
Hg, under the equivalence. Then the stabilisers of K and Hg, are equal. The first 
is just K; the second is 


{g €G : Hog = Hg} = {g € G : ggg’ € H} =97'Agy. 


So K = g; 'Hgı is conjugate to H. 
EXAMPLE. How many inequivalent actions of the symmetric group $3 on {1,...,n}? 


We first describe the transitive actions. S3 is a group of order 6, containing 
an identity, three elements of order 2, and two elements of order 3. By Lagrange’s 
Theorem, the possible orders of subgroups are 6, 3, 2 and 1, There is a unique 
subgroup of each of the orders 6 and 1. Further, the identity and the two elements 
of order 3 form the unique subgroup of order 3; and there are three subgroups 
of order 2, each consisting of the identity and an element of order 2. These three 
subgroups are all conjugate.’ So, up to equivalence, there is a unique transitive 
action on a set of size 1, 2, 3 or 6, and no others. 

Now an arbitrary action is made up of a disjoint union of these; so the number 
fn of different actions on {1,...,n} is equal to the number of ways of expresssing 


4 Their generators all have the same cycle structure; compare (13.1.2). 
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n as a sum of ones, twos, threes and sixes. I claim that the generating function is 
given by 

oOo 

At =U -p)d-#) - 8) - ts). 

n=0 


This is because the right-hand side is 
(+t+e+. AIHEENA HEHE. lteter st...) 


and the coefficient of £" is the number of ways of getting a term t° by multiplying 
e, t, #3, and #4 for some a,b,c,d; that is, the number of expressions n = 
a+ 2b + 3c + 6d. 

It is possible to find an explicit expression for f, from this formula. One way is 
to use analytic tools. Cauchy’s integral formula expresses f, as a contour integral, 
which can be evaluated by calculating residues at poles, which occur at the sixth 
roots of unity. But the digression would take us too far afield! 


Group actions clarify the distinction between labelled and unlabelled structures 
introduced in Section 2.5. Let C be a class of structures on a set {1,...,n}. (C 
might consist of graphs, families of sets, etc.) Two labelled structures C and C” are 
counted as the same unlabelled structure if and only if they are isomorphic, that is, 
there is an element of the symmetric group S, which maps C to C”. We consider the 
action of S, on the class C of labelled structures. In this action, unlabelled structures 
correspond to orbits; and the stabiliser of a structure C is its automorphism group 
Aut(C), the set of all permutations fixing it. 


(14.3.4) Theorem. fa) The number of different Iabellings of a structure C is equal to 
n!/| Aut(C)]. 
(b) If there are M labelled structures and m unlabelled structures (1,...,Cm; 


then 
m 1 M 


L Tan T H 


Consider, for example, Steiner triple systems on 9 points. Up to isomorphism, 
there is only one (Chapter 8, Exercise 3), and its automorphism group has order 432 
(Chapter 8, Exercise 4); so it can be labelled in 9!/432 = 840 ways. (This justifies 
the claim made in Chapter 8, Exercise 15.) 

We have more to say about counting unlabelled structures in the next chapter. 


14.4. The Schreier-Sims algorithm 


What is the order of the domino group? 

According to Lagrange’s Theorem, if a group G acts on a set X, then the size 
of the orbit of X is equal to the number of cosets of the stabiliser G, in G. We 
can calculate this; and G, is a smaller group than G, so we could hope to calculate 
its order, perhaps by a recursive procedure, and then find |G| by multiplying these 
numbers. We see that what is really needed for this is a generating set for G,- This 
simple idea is formalised in the Schreier—Sims algorithm; as we'll see, it gives a lot 
more information too. 
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First, we review how to compute orbits. Let S be a generating set for the group 
G acting on X. 


(14.4.1) Algorithm: Orbit of z 
Start with Y = 0. Add the point z to Y. 

While any point was added to Y in the previous step, apply all 
elements of S to the recently added points; whenever a point not 
in Y is obtained, add it to Y. 

At the conclusion, Y is the orbit of z. 


While we do this calculation, we can record a witness for each point in the orbit, 
a permutation carrying z to that point. If S = {91,..., 9m}, this is conveniently 
done by labelling a new point y with the number /, if it is the image of an earlier 
point under g;,. Then the earlier point must be yg;; 1 Rither it is z, or it has a label 
ig, and in the latter case it is obtained by applying gi, to yg; tg". Eventually we 
have gi, 1.69; = T, and so y = Tgi, .-. gi,- Note that we have not only an explicit 
element carrying z to y, but even an expression for this element as a product of 
generators. 

In fact, all the orbits can be described in this way. We give z a negative label, 
say —1, to distinguish it as an orbit representative. If Y = X, there is a single orbit; 
otherwise, select an unlabelled point, give it the label —2, and proceed as before. 
Eventually, every point is labelled, and the labels (together with the generators) give 
a complete (and compact) description of the orbits and witnesses. The n-tuple of 
labels is called a Schreier vector for G. 

Let Y = {z = 21,22,...,25} be the orbit of z, and let k; map z to z; as above, 
fori =1,...,5 (with kı = 1). If H = Ga, then Hhky,..., Hk, are all the cosets of H 
in G; in other words, k,,...,k, are coset representatives for H in G. 


To find generators for the stabiliser, we use: 
(14.4.2) Schreier’s Lemma. Let {91,...,9m} generate a group G; let ki,..., ks be 
coset representatives for a subgroup H of G. Let 7 denote the coset representative 


of the element g; in other words, ĝ = k; if Hg = Hk;. Assume that kı = 1. Then H 
is generated by the set 


Su = {kig gy i= lyes = l,... m} 


PROOF. All these elements lie in H, since each is the product of an element of G and 
the inverse of its coset representative. Now suppose that h = gi, 9:,...9i, E H. For 
j =0,...,7, let t; = gi <- 9i, and let uj = fj. Then, with up = 1, we have 


_ -1 -1 -1 
h = uoga uj -UiYigtg ++ Urli, >» 


since up = u, = l and all the other u; cancel with their inverses. But uj—1g;; lies in 
the same coset as u,; thus 5-19; 45 € Sy, and we have expressed h as a product 
of elements of Sy. 
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Now we can apply recursion, to get: 


(14.4.3) Group order: Schreier—Sims algorithm 
Let S be a set of generators for G. 
If §=@ or § = {1}, then |G| = 1. 
Otherwise, let z be a point not fixed by all elements of S; 
calculate the orbit Y of x and Schreier generators for G,. Apply 
the algorithm recursively to find |G,|. Then |G,| - |Y | = |G]. 


But let’s see what is really produced by this algorithm. We end up with a sequence 
of points £1, z2, . - - , z4 and information about subgroups G(0), G(1),...,G(d), where 
G(0) = G, G(i) = G(i—1)s, fori = 1,...,d, and G(d) = {1}. In fact, for i = 1,... ,d, 
we calculate a set T; of coset representatives for G(i) in G(i—1). Let T = TU.. .UTg. 
Then (z1,...,%a) is called a base for G — a base is a sequence of points such that 
the stabiliser of all these points is the identity — and T is a strong generating set. 
(We'll see soon that it really is a generating set.) Now T; is the index of G(t) in 
G(i — 1); the order of G is the product of these indices: 


IG| = IB]... [Ta]. 


The recursive nature of the construction is reflected by the fact that (x2,...,24) is 
a base, and T} U... U Ty a strong generating set, for G(1). 

We also have a membership test for G. This is a procedure which, given an 
arbitrary permutation G, decides whether or not g € G, and if so, expresses g in 
terms of the generators. 


(14.4.4) Membership test for G 
GIVEN a permutation g of X. 
If G = {1}, then g € G if and only if g = 1. Otherwise, is 
24g = %,t, for some tı € T,? 
o If not, then g ¢ G. 
e If so, then apply the membership test for G(1) = G., to gt’; 
and g € G if and only if gt;' € G(1). 


Note that this test is also recursive. If g passes the test, we will find unique 
elements t1, t2,... ta, with t; € T; for i = 1,...,d, such that gty’...t7’ € G(i) for 
all i. Then we have gty'...t7! = 1, so g = ta. - - t1. In other words, if g € G, then we 
find a unique expression for it as a product of elements of Ta, .. ., T1. This confirms 
our formula for |G|. It also shows that T is indeed a generating set for G, as the 
name ‘strong generating set’ suggested. Finally, the Schreier—Sims algorithm enables 
us to express each element of T, and hence the arbitrary element g of G, in terms of 
the original set S of generators. 
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This is just what is needed to solve a puzzle like Rubik’s cube or Rubik’s domino. 
We are presented with the puzzle in a disordered state, which is some well-defined 
permutation g of the initial state. We have to ascertain, first, if g is in the group 
generated by the moves (so that the given state could indeed have been obtained 
legally); and, if so, how to express g in terms of the generating permutations (so 
that, by reversing the sequence, we can return the puzzle to its initial state). 


There is one impractical feature of the algorithm as presented here. If the 
original group has s generators and acts on a set of size n, Schreier’s Lemma gives 
us a set of perhaps as many as sn generators for the stabiliser of a point. Then, the 
group G(:) fixing i base points might have up to sn! generators. Of course, G(d) is 
the trivial group, so all its potential snt generators collapse to the identity; and, if 
we are lucky, the collapse may begin earlier. But, to make the algorithm efficient, 
it is necessary to have a ‘filter’ which reduces the number of generators to within 
a practical bound, without changing the group they generate. This can indeed be 
done; but we won’t pursue this here. 


THE DOMINO GROUP. Since we know that the domino group has orbits of sizes 8, 
8, 1, 1, it must be a subgroup of the direct product Sg x Ss. (We can neglect the 
two fixed points; now Sg x Sg is the group of permutations which leave the other 
two orbits fixed setwise.) Now it turns out that the group is in fact Sg x Ss. One 
way to show this is to use the Schreier—Sims algorithm to calculate the order of the 
group, which turns out to be (8!)”. But a little hand calculation can be used to make 
the job easier. It we compose the first and third displayed generator, we obtain the 
permutation 


(AaCIGeo(BFHD?). 


The sixth power of this permutation is (B F H D 6), which fixes all the corner 
cubes and moves only the edge cubes. Now it can be shown that this and similar 
permutations generate the alternating group Ag of permutations of the edge cubes. 
Similarly, the fifth power of the permutation above fixes all the edge cubes; it and 
similar permutations generate the symmetric group Sg on the corner cubes, Thus the 
group contains at least Sg x Ag. But the first generator acts as an odd permutation 
of the edge-cubes. So the group is not Sg x As; and the only larger group it could 
possibly be is Sg x Ss. 


14.5. Primitivity and multiple transitivity 


Just as we've reduced the study of arbitrary group actions to transitive ones, it is 
possible to make further reductions. We now consider this, in rather less detail. 

Let G act on X. Remember that a relation on X is a set of ordered pairs 
of elements of X, that is, a subset of X 2 = X x X. We say that the relation R 
is preserved by G, or is G-invariant, if £ Ry implies zg Ryg and conversely. (The 
converse follows, by applying the inverse of g.) Now G acts on the set X?, by the 
tule 

(z,y)g = (29,99); 


and we have the following: 
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(14.5.1) Proposition. The relation R is preserved by G if and only if it is a union of 
orbits of G on X?. 


Proor. G-invariance means that (x,y) € R if and only if (£g, yg) € R for any 
gEG ; so the whole of the G-orbit of (x,y) is contained in R. Hence R is a union 
of orbits. The converse is similar. 


A G-congruence on X is an equivalence relation R on X which is preserved by 
G. (We don’t require that G fixes the equivalence classes of R.) There are always 
two trivial G-congruences (if |X| > 1): the relation of equality, and the ‘all’ relation 
R defined by the rule that z R y for all z,y € X. The group G is called imprimitive 
if there is a G-congruence other than these two, and primitive otherwise. 

Let G be a transitive permutation group. If R is a non-trivial G-congruence 
let X1, ..., Xm be the congruence classes, and Y = {X1,...,Xm} the set of classes. 
Now we define two new permutation groups: 

e G acts on the set Y; let Go be the permutation group on Y induced by G. 
s Let H be the subgroup of G which fixes the set X; (net its pointwise stabiliser) 

and Hy the permutation group induced on X, by H. 


(14.5.2) Theorem. G is isomorphic to a subgroup of the wreath product Ho wr Go; 
and the given action is equivalent to the restriction to G of the natural action of the 
wreath product. 


Thus, G can be regarded as being built out of the smaller groups Ho and 
Go. Both these groups are transitive. If either is imprimitive, we can continue the 
reduction further. We end up with a collection of primitive groups, the primitive 
components of G. (But note that G may have several different congruences, which 
may give rise to different collections of primitive components.) 


Let t be a positive integer not exceeding |X|. A permutation group G on X 
is said to be t-transitive if, given any two t-tuples (21,...,2;) and (y1,..., 4) of 
distinct points of X, there is a permutation g € G with z,g = y; for : = 1. veegte 
(In other words, G acts transitively on the set of ¢-tuples of distinct points.) Now 
1-transitivity is the same as transitivity (as defined in Section 14.3). 


(14.5.3) Proposition. Let G be t-transitive on X, with t > 2. Then 
(a) G is (t — 1)-transitive; 
(b) G is primitive. 


PRooF. (a) Take two (¢ — 1)-tuples (21,..., 21-1) and (y1,...,41-1) of distinct 
elements, Extend them to t-tuples by appending elements z; and y; respectively 
which are not among the elements in the tuples already. Then choose g with z;g = y 
or? = l,... t 

(b) We may assume that G is 2-transitive. Now any G-congruence R is a union 
of orbits of G acting on X? (Proposition 14.5.1), necessarily containing the diagonal 
A = {(z,z): z € X}, since R is reflexive. But, if G is 2-transitive, it has just two 
orbits on X?, namely A and X? \ A; so there are only two possible congruences. 
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If |X| =n, then the symmetric group on X is n-transitive. Also, the alternating 
group is (n — 2)-transitive if n > 2. (Given (n—2)-tuples (21,...,Zn—2); (Y15- -s Ya-2) 
of points, there are just two permutations which carry the first to the second; they 
differ by a transposition of the remaining points, so they have opposite parity, and 
one of them is in the alternating group.) 

It is known that no other finite permutation group can be more than 5-transitive. 
This remarkable fact is a consequence of the classification of the finite simple groups, 
perhaps the greatest collective achievement of mathematicians; but the proof is more 
than ten thousand pages long, so I must ask you to take it on trust. 


14.6. Examples 


EXAMPLE: STS(7). We showed in Chapter 8 that there is a unique STS(7), up to 
isomorphism (see Fig. 14.3). In fact, the argument shows the following: 


Fig. 14.3. STS(7) 


Let (X,B) and (Y,C) be Steiner triple systems of order 7. Let 
(£1, 22,23) be a triangle in the first system, and (y1, y2, ys) a triangle 
in the second. Then there is a unique isomorphism from the first 
system to the second which maps x; to y; for i = 1,2,3. 


For the isomorphism must map the third point on the block through zı and zz to 
the third point on the block through y, and y2, and similarly for the other two sides 
of the triangle; then it maps the seventh point of X to the seventh point of Y. This 
map really is an automorphism: three of the remaining blocks consist of a vertex, 
the ‘third point’ of the opposite side, and the ‘seventh point’ of the design; the last 
block consists of the ‘third points’ of the three sides. 

From this, we can calculate the order of the automorphism group of the Steiner 
system. By choosing the two systems to be equal (so that the isomorphisms are 
automorphisms), the number of automorphisms is equal to the number of (ordered) 
triangles, which is 7-6-4 = 168. We also see that a triangle is a base for the 
automorphism group. 

Now the automorphism group is 2-transitive. (The proof is a modification of 
the proof of Proposition 14.5.3(a). Let (x1,z2) and (yi, y2) be two pairs of distinct 
elements. Now choose z3 so that (#1, 22,43) is a triangle; and choose ya similarly. 
Then choose an automorphism carrying the first triangle to the second.) In particular, 
it is primitive. 

We can put a name to this automorphism group. In Section 8.5, we saw that 
the points of the STS(7) can be labelled by the non-zero vectors of a 3-dimensional 
vector space V over GF(2), so that the blocks are the triples of points with sum 
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zero. Now the group GL(3, 2) of invertible 3 x 3 matrices over GF(2) acts on the 
non-zero vectors in V, and obviously maps any block to a block; so it is a group of 
automorphisms. But 


| GL(3, 2)} = (2? — 1)(2? — 2)(28 — 2°) = 168, 


so this is the full automorphism group. 


EXAMPLE: THE PETERSEN GRAPH. Recall the Petersen graph from Chapter 11 (see 
Fig. 14.4). (Ignore the labels for the moment.) 


Fig. 14.4. The Petersen graph 


We saw in Section 11.12 that any subgraph of shape can be completed 
in a unique way to a graph on 10 vertices with valency 3, diameter 2 and girth 5. 
This means, by the same kind of argument as we gave for the Steiner triple system, 
that the number of automorphisms of the Petersen graph is equal to the number of 
subgraphs of this type, which is 10-3-2-1-2-1 = 120. 

Now consider the labels in Fig. 14.4. We have labelled each vertex with a 
2-element subset of {1,...,5}, so that all (5) = 10 2-subsets are used. A little 
checking shows that two vertices are adjacent if and only if their labels are disjoint. 
It follows that any permutation of {1,...,5}, in its induced action on the 2-subsets, 
is an automorphism; and we find a group of automorphisms isomorphic to $s, with 
order 120. So the full automorphism group is Ss. 

Now the automorphism group is clearly transitive on vertices. It is not 2- 
transitive, since no automorphism can map two adjacent vertices to two non-adjacent 
vertices. However, we see that the orbits of S5 on X? are three in number: 

@ the diagonal {(z,2) : 2 € X}; 

o the set {(z,y):2~ y}; 

o the set {(z,y): 2 £y, £ $y} 
The automorphism group is transitive on (ordered) edges and on (ordered) non- 
edges. 

From this information, we can show that 5; is primitive on X. For a congruence 
R must be a union of some of these three orbits, and must include the diagonal. 
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Suppose that it contains the second orbit (the ordered edges). Since we can find 
vertices x,y, 2 With z ~ y ~ z and x 4 z, we have z Ry and y Rz, so, by transitivity, 
zRz. Thus R contains all the ordered non-edges as well, and is the universal 
relation. A similar argument applies if R contains the orbit of non-edges. So either 
R is the diagonal, or R = X’. This means that the group is primitive. 


14.7. Project: Cayley digraphs and Frucht’s Theorem 


Let S be a subset of a group G, not containing the identity. The Cayley digraph D(G; S) of G with 
respect to S is defined to have vertex set G, and edges (g, sg) for each s € S and each g € G. The 
Cayley graph T(G; 5) is the underlying graph of D(G; S); that is, its vertex set is G, and it has edges 
{g,8g} for each s € S and g € G. We can regard the element s as a ‘label’ on the edges (g, 59) of 
D(G; S), or the corresponding edges of ['(G; 5). (Note that, if an element s and its inverse both lie 
in S, they label the same edges of ['(G; 5).} 

Now the following holds. 


(14.7.1) Proposition. (a) D(G; S) is connected if and only if S generates G. 
(b) For each g € G, the map py : z > zg is an automorphism of D(G; 5). 


Proor. (a) If S generates G, then any g € G can be written as a product of elements of $ and their 
inverses. This product tells us how to find a path from the identity to g. For example, if g = s153 15s, 
then we have an edge (1, s3) labelled 33, an edge (sz 13, 83) labelled sq (but going in the wrong 
direction), and an edge {s} 143, 5187's) labelled s1. 

(This argument shows that the digraph is connected {which means that the underlying graph 
is connected, see Section 11.8), not that it is strongly connected. In fact, if G is finite, the strong 
connectedness of D(G; S) follows from the connectedness (see Exercise 12).} 

The converse is similar: any path from 1 to g in the underlying graph translates into a product 
of elements of S and their inverses which is equal to g. 

(b) A simple check: if (x, sx) is an edge, then so is (tpg, stp.) = (zg, sg), by the associative 
law. 


Note that the permutations p, comprise the permutation group in the proof of Cayley’s Theorem 
(14.1.1), isomorphic to the abstract group G. Sa we have an action of G on the vertices of the Cayley 
digraph or graph, as a group of automorphisms. Note that this action is transitive; for py-1, maps 
g to h. We denote the permutation group by p(G), to distinguish it from G (the set of points being 
permuted): we are thinking here of p as the action of G. 

More is true: 


(14.7.2) Proposition. Suppose that S generates G. Then any automorphism of D(G; S) which preserves 
the labels on the edges belongs to p(G). 


Proor. Let f be an automorphism which preserves the labels. Since all elements of p(G) also 
preserve labels, we can compose f with the element ,,-1 to obtain an automorphism fixing 1; and 
this automorphism lies in p(G) if and only if f does. So we may assume that f fixes 1. Now, for 
each s € S, there is a unique edge with label s and initial vertex 1 (namely (1, s)), and a unique edge 
with label s and terminal vertex 1 (namely (s~!,1)). So f must fix all elements s or s~! for s € S. 
In this way we can work out through the digraph, and find that f fixes every element which is a 
product of elements of S and their inverses. But, by assumption, these elements comprise all of G; 
so f = 1 € p(G). 


Now we can prove Frucht’s Theorem: 


(14.7.3) Theorem. Every finite group is the automorphism group of a finite graph. 
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Proor. Let G be a finite group. We assume that G has at least 5 elements. (For smaller groups, 
it’s not difficult to write down suitable graphs.) Now take S = G \ {1}, and construct the Cayley 
digraph D(G; S). Since S generates G, we know from (14.7.2) that the group of automorphisms of 
this digraph preserving the edge labels is isomorphic to G. The trick is to replace the labelled directed 
edges with subgraph ‘gadgets’ to ensure that the automorphism group remains the same. 

Let yn denote the graph with n+ 4 vertices a,b,c,d,e:,...,€n, having the following edges: 
{a,b}, {b,c}, {c,d}, {8,e1}, fer, e:41} (6 = 1,...,n — 1) (see Fig. 14.5). Now let § = {s1,...,8m-—ih; 


Fig. 14.5. A gadget 


where m = |G|. Replace each edge (u,v) of D(G; S) with label s, with a copy of the gadget Yn, 
where the vertices a and d of the gadget are identified with u and v. (All the added gadgets are 
disjoint apart from these identifications.) Let T be the resulting graph. Thus, some vertices of T 
are elements of G (coming from D(G; §)), while any other vertex belongs to a unique gadget. Now 
observe: 
© We can recognise the elements of G in T, 

since they have valency m > 4 while any vertex of a gadget has valency at most 3. Moreover, edges 
with the same label are replaced by isomorphic gadgets, so the label-preserving automorphisms of 
D(G; S) extend to automorphisms of T; but we can recover the label and the orientation of the 
edge joining any two elements of G from the gadget in T, so any automorphism of [ induces a 
label-preserving automorphism of D(C; S). Thus, Aut(T) = G, as required. 


14.8. Exercises 


1. Let G be a permutation group on X = {2z1,..., 2n}. Regarding each permutation 
g in ‘passive’ form, that is, as an n-tuple (#19,..., 22g), show that the result of 
applying the permutation h to the n-tuple g is the composition gh. Deduce that 
Aut(X, G) =G. 


2. Show that the symmetry group of the regular octahedron is the wreath product 
S2 wr S3, having its natural action on the six vertices, and its product action on the 
eight faces. Show that this group is also isomorphic to S3 x S4. 


3. ‘Most naturally-occurring equivalence relations in mathematics arise from group 
actions.’ Discuss. (Hint: You will find some useful examples in elementary linear 
algebra.] 


4, Consider the STS(7) in cyclic form: the point set is Z/(7), the blocks are 
{013, 124, 235, 346, 450, 561,602}. Clearly the permutation z ++ z + 1 (the permuta- 
tion (0 1 2 3 4 5 6)) is an automorphism. Show that the permutation (2 6)(4 5) is 
also an automorphism. Now show that these two automorphisms generate the full 
automorphism group. 


5. Show that any subgroup of the symmetric group of degree n can be generated by 
at most n(n — 1)/2 elements. 
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6. Let G be the symmetric group Sn, acting on the set X of ordered pairs of 
distinct elements from {1,...,n}. Show that, for n > 5, there are three non-trivial 
G-congruences, defined as follows: 

(21,22) Ra(yi, 42) if and only if 21 = 41; 

(z1, £2) Ro(y1, y2) if and only if 2 = 42; 

(21,22) Ra(y1, Y2) if and only if {21,02} = {yr 92}. 
What happens if n = 4? 
7. A left coset of a subgroup H of a group G is a set gH = {gh : h € H}. Prove that 
(a) the numbers of left and right cosets are equal; 
(b) there is a set of elements which are both right coset representatives and left coset 

representatives. 
[Hint ror (b): Let £ and R be the sets of left and right cosets. For each R € R, 
let Ar = {LEL: LN R £ 9}. Show that the family (Ap: R € R) satisfies Halls 
Condition (6.2.2). (This was essentially Hall’s original application of his theorem.)] 
If G acts on X, and H = G,, describe the left cosets of H in terms of the action 

(analogous to the proof of (14.3.2)). 
8, Show that all congruence classes of a congruence for a transitive group have the 
same size, Deduce that a transitive group acting on a prime number of points is 
primitive. 
9. (a) Prove that a graph with 2-transitive automorphism group must be complete 


or null. 

(b) Find all graphs whose automorphism group is transitive on vertices, ordered 
edges, and ordered non-edges, but is not primitive on vertices. 
10. Let G be t-transitive on X. Prove that the number of orbits of G on the 
Cartesian power X* is the Bell number A(t). (Hint: For t = 3, the five orbits 
consist of triples (x, 2,2), (2,z,y), (2,4, £) (y,2,2), and (x,y,z), Where x,y, 2 are 
all distinct.) 
11. Show that the two graphs of Fig. 14.6 are isomorphic. Hence write down 
automorphisms of order 3 and 5 of the Petersen graph. What is the group generated 
by these two automorphisms? 


Fig. 14.6. Two isomorphic graphs 


12. Let D be a finite digraph whose automorphism group acts transitively on its 
vertices. Show that, if D is connected, then it is strongly connected. (In other words, 
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if you can walk from A to B, then you can drive.) Show that this conclusion is false 
for infinite digraphs. 


13. For each group G of order less than 7, find a graph whose automorphism group 
is G. 

14. (a) The centraliser of a permutation g € Sn is the group of all permutations 
h € S, which commute with g, that is, which satisfy gh = hg. Prove that the 
centraliser of g is a subgroup of S,, and is isomorphic to 


(Cy wr Sa) X (C2 wr Sar) X <- X (Ca wr San), 


where g has cycle structure 1%2* ... n°" (Section 13.1). 

(b) Let I be a disconnected graph. Let [,,..., Dx be representatives of the iso- 
morphism types of the connected components of T, and suppose that a; components 
are isomorphic to [; for i = 1,..., k. Prove that 


Aut(L) = (Aut(14) wr Sa) x (Aut(P2) wr Saa) x... x (Aut(T k) WE Sag) 


15. (a) Prove that the cyclic group of order n contains ¢(n) elements which generate 
the group, where ¢(n) is the number of residue classes mod n which are coprime to 
n. ($ is Euler's function or the totient function.) 

(b) Prove that 


where y is the classical Möbius function. 


16. Let G be a permutation group on X., For each subgroup H of G, let fix(H) be 
the number of points of X which are fixed by every element of H. Prove that the 
number of orbits of G on which no non-identity element of G fixes a point is 


L 5 fix(H)yu(H,G), 


IGl iza 


where j is the Möbius function of the lattice of subgroups of G (ordered by 
inclusion). 
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‘| count a fot of things that there’s no need to count,’ Cameron said. ‘Just 
because that’s the way | am. But I count all the things that need to be counted.’ 


Richard Brautigan, The Hawkline Monster (1974) 


Topics: Orbit-counting Lemma; cycle index; enumeration of func- 
tions by weight 


TECHNIQUES: Calculation of cycle index 
ALGORITHMS: 


CROSS-REFERENCES: Direct and wreath products (Chapter 14); Stir- 
ling numbers (Chapter 5); unlabelled structures (Chapters 2, 14); 
symmetric polynomials (Chapter 13) 


In this chapter, we develop a theory of counting which is associated with the names 
of Redfield and Pólya. Typical of these problems is that the configurations we count 
‘live’ on some basic object, and two of them should not be counted as different 
whenever one can be transformed into the other by a symmetry of the underlying 
object. One example of this setup is the counting of unlabelled graphs — review 
the remarks on this in Chapter 2 — where a graph ‘lives’ on a vertex set, and 
isomorphism of graphs is defined by means of permutations of the vertex set. For 
another example, we will count the number of necklaces that can be made using two 
colours of beads, two necklaces being counted as the same if one can be transformed 
into the other by a rotation of the necklace, or by picking it up and turning it over. 


1. The Orbit-counting Lemma 


I said in the last chapter that naturally occurring equivalences usually come from 
group actions; that is, the equivalence classes are orbits of a group. The next result 
gives a formula for the number of orbits. 

Let G be a permutation group on a set X. For each element g € G, we let fix(g) 
denote the number of points z € X fixed by g (that is, satisfying zg = =). 


1 This result is commonly referred to as ‘Burnside’s Lemma’. It was given without attribution by 
Burnside in his book Theory of Groups of Finite Order, which introduced the French and German 
developments in the subject in the second half of the nineteenth century to English mathematicians; 
but it has been traced back to earlier work of Cauchy and Frobenius. I prefer the impersonal term 
given here. 
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(15.1.1) Orbit-counting Lemma 
The number of orbits of a permutation group G is equal to the 
average number of fixed points of its elements, viz. 
=D flo) 
a x(g). 


eG 


PRooF. We suppose first that G is transitive, and show that the expression in 
the lemma equals 1, by double-counting pairs (z,g) with rg = z. On one hand, 
the number of pairs is Z eg fix(g). On the other, it is Docx |Ge|. Now, since G 
is transitive, Gs has n cosets, where n = |X|; so the sum is zex |G//n = |GI. 
Equating the two expressions and dividing by |G] gives the result. 
Now let G have t orbits X,,...,X;, and let fix;(g) be the number of fixed points 
of g in X;. Since G acts transitively on X,, we have 
A 
i > fixi(g) = 1. 


gEG 


Also, we have fix(g) = Di- fix:(g). So 


i=l 


as required. 


It is immaterial whether we have a permutation group or (more generally) an 
action of a group in the Orbit-counting Lemma. For let @ be an action of G, with 
kernel N. Then each permutation in the image of @ is the image of precisely |N] 
elements of G (comprising a coset of N); also the order of G is |N] times as large as 
that of its image, so the factors |N| cancel and the average number of fixed points 
is the same for G and its image. 


15.2. An application 


In how many ways can the faces of a cube be coloured with two 
colours? Assume that two coloured cubes which differ by a rotation 
are identical. 


The group in question is the group of rotations of the cube, which has order 24 
(and happens to be isomorphic to 5,). We can list its elements as follows. Here, a 
face-axis, edge-axis, or vertex-axis is an axis of symmetry joining centres of opposite 
faces, midpoints of opposite edges, or opposite vertices, respectively. 
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Type Axis Order of rotation No. of elements 
1 (identity) — 
2 Face 
3 Face 
4 Edge 
5 Vertex 


24 


Now we let X be the set of 2° = 64 colourings of the cube. We must calculate 
fix(g) for each g € G. A colouring is fixed by g if and only if all faces in the same 
cycle of g have the same colour, so fix(g) = 2), where c(g) is the number of cycles 
of g on the faces of the cube. 


Type elg) fix(g) Contribution 
64 
48 
48 
48 
32 
240 


So the number of different cubes is 240/24 = 10. Can you describe them? 


It is clear that the same method would work for any number r of colours, 
giving the answer as a polynomial in r (Exercise 1). This observation motivates the 
introduction of generating functions and enumeration by cycle index, a topic we 
now consider. 


15.3. Cycle index 


Given a permutation g on X, there is a cycle decomposition of g, an expression for 
g as a product of disjoint cycles, unique up to the starting points of the cycles and 
the order of the factors (see Section 13.1). Let there be cı cycles of length 1, co of 
length 2,...,¢, of length n, where n = |X|. (In the cycle notation for permutations, 
we commonly suppress cycles of length 1, but it is important to count them here.) 
We define the cycle index of g to be the monomial 


. = ofl o2 C 
2(G3 S1. -3 Sa) = SP SF -SR 


in indeterminates 51,...,5,. Now, if G is a permutation group on X, the cycle index 
of G is the average of the cycle indices of its elements: 


Z(G; 51,..-)8n) = a L Ils, 


g6Gi=1 


where c;(g) is the number of cycles of g of length i. Just as in Section 15.1, if a 
group G acts on a set X, the cycle index of G is the same as the cycle index of the 
induced permutation group. 
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Before we prove the main result connecting cycle index with enumeration, we 
give a couple of examples of its use, The group G has an induced action on the set 
of all k-element subsets of X, and another on the set of k-tuples of distinct elements 
of X, for each k with 1 < k < n. We let f} and F} be the numbers of orbits of G 
in these actions. By convention, fo = / = 1. So fi = F is the number of orbits 
of G on X, and F, = 1 if and only if G is k-transitive on X (Section 14.5). We 
show that the ordinary generating function for the numbers fk, and the exponential 
generating function for the Fp, can be calculated from the cycle index of G by simple 
substitutions. 


(15.3.1) Proposition. (a) $> f,t* = Z2(G;14+t,147,...,14 4"). 
k=0 


(b) YD Fatt /k! = 2(G;144,1,...,1). 
k=0 


PRoor. We let fix.(g) and Fix,(g) denote the number of k-subsets, or k-tuples of 
distinct points, respectively, fixed by g. 
(a) 

Ent = D E ial) 

k=0 I | eG kao 
Now consider a permutation g, with c,(g) cycles of length 7. For each choice of 
numbers b; < ¢;(g) with D% tb; = k, we can find k-sets fixed by g which consist of 
b; cycles of length 7 for i = 1,...,n. Moreover, any fixed k-set is a union of cycles. 


r Eat hzi ACA) 
‘th 0 


k=0 i=1 
where ©” is over all (b,..., we 
then sum over k, this is just 


n cilg) Ci 5 
a Sly (P) -ggio 


gEG i=1 b=0 gEGi=1 
= Z(G;1 +t 1+8, 1HE 
(The manipulations here are similar to those explained in more detail in Section 4.2.) 


0 <h < e(g) and id; = k. But since we 


(b) This one is easier. A k-tuple is fixed if and only if all of its points are fixed; 


Fixk(g) = e1(g)(er(g) — 1)... (ex(g) — k + 1). 
Thus, 
z ntk] = ei(g)(er(g) — 1).-- (eifg) -k +1) x 
2 Fit {k Ly D — St 


IG Ze =0 


=a eel)" 


“ai y+ ey 


gEG 
= Z(G; +¢,1,...,1). 
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Now we turn to the general situation. We take a collection of ‘figures’ $1, $2, ..-; 
each of which has a non-negative integral ‘weight’ w(¢;). The set of figures needn’t 
be finite (though it is in many applications), but we do assume that there are only 
finitely many figures of any given weight, We can summarise this information in a 


figure-counting series 
a(t) = $ ant", 


n>o 
where a, is the number of figures of weight n. 

Now we are given a permutation group G on a set X — typically G is the 
automorphism group of some object — and we want to count the number of ways 
of associating a figure with each point of X, two such ‘configurations’ being regarded 
as identical for the purpose of the count if some element of G takes one to the 
other. (Typically the ‘figures’ are colours, and we want to count the number of 
inequivalent colourings, as in the example of the coloured cubes in the last section.) 
An attachment of figures to points of X is defined by a function f : X — ®, where 
È is the set of figures; it has a total weight 

w(f) = Z w(f(z)). 

rex 

Now G acts on the set of functions, by the rule 

(f9)(2) = f(g"). 
(The inverse is technically required to make this a valid action; but, informally, 
it arises because we are regarding the elements of G as place-permutations here 
— compare the discussion in Section 14.2.) We want to count the orbits of G on 
functions, which we do by means of the function-cowating series 

b(t) = So bat”, 
n>o 

where b, is the number of orbits of G on functions of total weight n. (The action of 
G doesn’t change weights of functions.) 


(15.3.2) Cycle Index Theorem 


b(t) = Z(G; a(t), a(t”),... , a(t"). 


Before proving the theorem, we show that part (a) of (15.3.1) is a consequence 
of it. We take two figures, with weights 0 and 1: we might as well call the figures 
themselves 0 and 1. The figure-counting series is just 1 + ¢. Now a function from X 
to the set {0,1} is nothing but the characteristic function of a subset Y of X; and 
the action of G on functions is equivalent to its natural action on subsets. (If f is 
the characteristic function of Y, then 


(fe(2)=1 @ zg'EY @ ZeEYy, 


so fg is the characteristic function of Yg.) So fst" is the function-counting series, 
and the formula for it follows from the Theorem. 
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We take the proof in four steps. 


STEP 1. Let ® and © be sets of figures, with figure-counting series a(t) and a'(t) 
respectively. Then the generating function for counting pairs (¢,¢') E€ © x ©, 
enumerated by the sum of the weights, is a(t)a’(¢). 

Proor. If a(t) = Dat and a'(t) = Fait’, then the number of pairs (¢, ¢’) with 
w(¢) = j, w(¢’) =i — j, is ajai_;. Summing over j gives the total number of pairs 
with w(¢) + w(¢’) = i, and also the coefficient of t in a(t)a‘(t). 

STEP 2. Let ® have figure-counting series a(t), and let X be an n-set. Then the 
generating function for counting functions from X to ®, enumerated by total weight 
(that is, the function-counting series for the trivial group on X) is a(t)”. 


Proor. If X = {21,...,2,}, then functions from X to Ọ are represented by n-tuples 
(f(21),..-,f(#n)) of elements of ©. The result now follows from Step 1 by induction. 
Note that this is the special case of the Cycle Index Theorem for the trivial group 
on X (whose cycle index is s7). 


STEP 3. The series enumerating functions from X to ® fixed by a permutation g of 
X, by total weight, is 
2(g a(t), a(t?),...,a(t")). 

Proor. A function is fixed by g if and only if it is constant on the cycles of 
g. So a fixed function is specified by giving, for each z, a function from a set 
of representatives (c;(g) in number) of the z-cycles of g, to ®. For fixed 2, these 
functions are enumerated by a(t), by Step 2. However, since such a function has 
each value repeated i times on X, its contribution to the total weight is multiplied 
by i, so this contribution is enumerated by a(t*)*), Now, by Step 1 and induction, 
the overall generating function is obtained by multiplying these contributions for all 
values of i; in other words, it is 


a(t) a(t?)2) a(t?) = 2(g; a(t), a(t”), ... ,a(2")), 


as required. 


Step 4: COMPLETION OF THE PROOF. The number of orbits of G on functions of 
weight k is the average number of fixed functions of weight k of its elements. By 
Step 3, this is the coefficient of ¢* in 

1 


iG] 2 zls; a(t), a(t”), ..-, a(t") = Z(G; a(t), a(t), ...,a(t")), 


by definition of cycle index. 


15.4. Examples 


We consider some applications of the Cycle Index Theorem. 


EXAMPLE 1: COLOURED CUBES. Consider the group of rotations of the cube, acting 
on its faces. From the table in Section 15.2, the cycle index of this group is 


A (e$ + 35253 + 65754 + 683 + 853). 
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We can refine our earlier count of ten red-and-blue cubes by enumerating them by 
number of red faces. Thus, we let red have weight 1, and blue have weight 0. The 
figure-counting series is 1 + ¢, so the function-counting series is 


Ay ((1 + t)® + 3(1 + t)?(1 + 7)? + 6(1 + t)?(1 + 4) + 6(1 + £7)? + 8(1 HE) 
=1 +44 2¢? + 245 + 2e4 4 28 4 £% 


This should check with the listing of cubes you gave in Section 15.2. 

Again, to count the number of ways of colouring with a given number r of 
colours, take all colours to have weight 0. Then the figure-counting series is 7, and 
the number of colourings is 


Alre + 3rt + 12r3 + 8r?) = dr’ (r + 1)(r? — r? + 4r + 8). 


For example, when r = 3, there are 57 different coloured cubes. 


EXAMPLE 2: NECKLACES. To count necklaces, we need to find the cycle indices of the 
cyclic group (if we allow only rotations} and the dihedral group (if inversions are 
allowed). 

Euler’s function $(n) (sometimes called the totient function) is the number of 
congruence classes mod n which are coprime to n. For example, 4(12) = 4, and 
(p) = p— 1 if p is prime. By convention, ¢(1) = 1. 


(15.4.1) Lemma. The cyclic group of order n contains, for each divisor d ofn, ¢(d) 
elemenis of order d. Each has n/d cycles of length d. 


Proor. We can identify C, with Z/(n). The order of a congruence class m 
is the smallest positive z such that mz = ny for some y. If mr = ny, then 
mr/(m,n) = ny/(m,n). Since m/(m,n) and n/(m, n) ate coprime (we have divided 
out the common factors of m and n), the least positive solution is z = n/(m,7), 
y = m/(m,n). So the order of m is n/(m,n). 

Now we reverse the argument and ask: how many classes m satisfy n/(m,n) = d 
for a given divisor d of n? For such an m, we have m = (m,n)y, where (y, d) = 1; 
there are ¢(d) choices of y, each giving rise to a unique m = ny/d. 

Note in particular that the number of elements of C, which generate the group 


is $(n). 
So the cycle index for Cn is 
1 n 
— >> gld). 
n din 


The dihedral group Dan contains the cyclic group Cn, together with n reflections. 
If n is odd, each reflection has one fixed point and (n — 1)/2 cycles of length 2; 
while, if n is even, then half of them have no fixed points and n/2 2-cycles, and the 
rest have two fixed points and (n — 2)/2 2-cycles. Thus 


2 (Dan) = 5(Z(Cn) + Ra), 
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R, = fag n odd, 


> 
(82? 4 82871?) A2, n even. 


For example, when n = 10, we have 
Z(Do9) = zl + 482 + 4819 + 688 + 58783), 


so the generating function for black and white necklaces by number of black beads 


18 
1424 5¢? + 8¢? + 16¢* + 16t° + 162° + BE? +5 + 49 tt”, 


while the number of different necklaces with u colours of beads is 
pu(u t+ 1)(u — u? + uô + 6u? + 4). 


EXAMPLE 3: GRAPHS. A graph on the vertex set X is determined by its set of edges, 
a subset of the set of all 2-subsets of X. So, if |X| = n, the number of labelled 
graphs is 26) = 2-1/2, of which manus >) have m edges. How many unlabelled 
graphs are there? We identify two graphs if some permutation of X maps one to the 
other, so the group in question is the symmetric group Sn; but we have to calculate 
the cycle index for its action on the 2-subsets of X. 

The cycle index for the usual action of S, is implicit in (13.1.5), where we 
calculated, for each partition of n, the number of permutations with that partition 
as cycle structure: that is, if A = 12%... n”, 


218.) = 0 (ae r Ps 


Abn i=l ima;! 


For the action we require, the conjugacy class sizes are the same, but the cycle 
structures are different. Rather than give an explicit formula, I will explain how the 
calculation is done, and work an example. 


Consider a permutaion g of X. We consider two types of 2-subsets of X; those 

contained within a cycle of g, and those which straddle two different cycles. 

(i) In a cycle C of g of length m, if m is odd, there are (m — 1)/2 cycles of length 
m on 2-sets; if m is even, there is one cycle of length m/2 (consisting of pairs 
of points which are opposite in C) and (m — 2)/2 cycles of length m. 

(ii) If two cycles have lengths m, and mz, then there are (m1, m2) cycles of length 
myme2/(m, mz) on pairs consisting of one point from each cycle. 


EXAMPLE: n = 4, Using the above rule, we find: 


Cycle structure Cycle structure Number of 
on points on pairs permutations 
18 1 
172? 
1222 
32 
214! 


15.5. Direct and wreath products 
So the cycle index is 
3 (s$ + 93283 + 855 + 632s"), 
and the generating function for graphs by number of edges is 
L+et 267 4 349 4 H +2, 


which is easily checked by listing the graphs. 
Note that the generating function gives us confidence that we haven’t overlooked 
any possibilities! 


15.5. Direct and wreath products 


There are simple formulae for calculating the cycle index of the direct or wreath 
product of two permutation groups (in its natural action) from those of the factors. 


(15.5.1) Proposition. Z(G x H) = 2(G)Z(H) 


Proor. We have 


1 
Z(G x E) = =r z((g, h 
( ) exe ona ((g,)) 


(9; h)) 
-agmg O 
and so we have to show that z({g,h)) = z(g)z(h) for any ty) g,h of 
disjoint sets. But this is immediate from the fact that c;((g,h)) = e:(g) + c(h). 
(Recall that the natural action is on the disjoint union of the sets.) 


(15.5.2) Proposition. Z(G wr H) = Z(H; Z(G; 51, 52,.-.), Z(G; 82, $4,---),---). 


In other words, Z(GwrH) is obtained from Z(H) by substituting Z(G; si, $2;,.--) 
for s;, for each i. 


Proor. Rather than direct calculation (which gives little insight), we will show that 
the ‘recipe’ given by the right-hand side for calculating the function-counting series 
is correct. Then we appeal to the principle that, for any permutation group K, 
Z(K) is the unique polynomial in s;,s2,... such that, for any figure-counting series 

a(t), the function-counting series is obtained by substituting a(t’) for s; for each 7. 
However, I won’t give a proof of this principle. 

So let G act on X and H on Y, where |X| = n, |Y| = m, say Y = {y1,.. -3 Ym}. 
Take a set ® of figures with figure-counting series a(t), Recall that G wr H acts 
on X x Y = UZ, Xi, where X; = X x {yi}. Elements of the base group act as 
independent m-tuples from G on the sets X1,...,Xm, while elements of H permute 
these sets. 

The counting series for functions on X fixed by G is c(t) = Z(G; a(t), a(#?),...). 
Now we can regard these functions as forming a new set Y ip figures’. Functions f 
from X x Y to © fixed by the base group can be identified with functions f from Y 
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to Y; and f is fixed by G wr H if and only if f is fixed by the top group, tht is, by 
E acting on Y. So, finally, the function-counting series for G wr H is 


Z(H; c(t), e(t?),...) = Z(H; Z(G; a(t), a(é?),. ..), Z(G; a(t”), at) -..),--.). 


This is exactly what is obtained from the right-hand side of the proposition by 
substituting a(t*) for s; for each i, 


EXAMPLE. Sz and S3 have cycle indices (s? + s2) and Hs? + 35182 + 253). So the 
cycle index for S_ wr S3 is 


a(s (si + 52)? + $(s} + 52)(83 + 54) + (53 + 56)) 
=} (sf + 3sfs2 + 95253 + 753 + 65254 + 63234 + 853 + 85°). 


Check this by using the fact that this group is isomorphic to the group of 
symmetries of the cube acting on its faces. 


15.6. Stirling numbers revisited 


In the preamble to (15.3.2), we introduced the notation Fn for the number of orbits 
of a permutation group G on the set of n-tuples of distinct points of X. Now let F" 
be the number of orbits on all n-tuples from X (that is, on the set X”). The next 
result gives the relationship between these sequences. 


(15.6.1) Proposition. FT = X- S(n, k)F,, 
k=1 
where S(n,k) is the Stirling number of the second kind. 


PROOF. Given an n-tuple (2;,...,2,), we construct from it a partition of {1,...,n}, 

corresponding to the equivalence relation in which i = j if and only if z; = z;. If the 

partition has k parts, then the n-tuple has k distinct entries; let these be (y1,.-., yk) 

(in order of appearance). Now two n-tuples lie in the same orbit of G if and only if 

both 

(a) the partitions of {1,...,n} they define are the same; and 

(b) the corresponding tuples (y1,..., yx) and (yj,...,y%) of distinct elements lie in 
the same orbit of G. 

Now there are S(n,k) partitions with k parts, and for each partition there are F, 

orbits of G on k-tuples of distinct elements. Multiplying, and summing over k, gives 

the result. 


We examine two extreme cases of this result. 
1. If G is n-transitive, then f = 1 for k < n, and so F* = Dg- S(n,k) = B(n), the 
Bell number (see Exercise 10 of Chapter 14). 


2. Take G to be the trivial group on a set of size t. We have F, = t(t — 1)... (t — 
n+i1)=(¢),, and F? = t". So 


= È S(n, k)(t)e- 


Since this is true for all positive integers 7, it is a polynomial identity. (Compare 


(5.3.3(b)).) 
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Using the Orbit-counting Lemma, we can give the combinatorial proof of the 
‘reverse’ of this last formula, promised in Chapter 5, namely 


te 


(t)a = D> s(n, kjt". 


k=1 


For this, we consider the set of functions from X to {1,...,t}, where |X| = n, and 
count orbits of S, on this set. A function is fixed by a permutation g if and only 
if it is constant on the cycles of g. Since there are, by definition, (—1)"~*s(n, k) 
permutations in Sa with k cycles, the number of orbits is 


q EOD sln, kje” = cr È s(n, k)(—t)*. 


But the number of orbits on such functions is just the number of choices of r things 
from a set of size t, with order unimportant and repetitions allowed. By (3.7.1), this 


number is 
(" +t- ') -i (i) = ED 


(i) = F s(n, k(t 


k=1 


This holds for all positive integers t, and so it is a polynomial identity. Now 
substituting —t for t gives the required result. 


15.7. Project: Cycle index and symmetric functions 


Recall from Section 13.5 the notion of a symmetric polynomial in the indeterminates 2),...,2n, 
and some special symmetric polynomials: the elementary symmetric function er, the sum of all 
products of r distinct indeterminates; the complete symmetric function h,, the sum of all products 
of r indeterminates (repetitions allowed); and the power sum function pr = rj +... + £p. We'll see 
that the cycle index of the symmetric group is a recipe for expressing A, in terms of the power sum 
functions; and the alternating group plays a similar rôle with respect to the function én. 


Recall also from Section 13.5 
e the generating functions 
E(t) = Devo ert’ 
A(t) = Doro ht", 
P(t) = Epy Prt? 45 
d 
è the formula P(t) = — log H(t). 
dt , . : 
In addition, the formula from Section 13,1 for the number c, of permutations with cycle structure 4, 


viz. 
n! 


= Timi Qa)... 


, 


where A = 171277... 


(15.7.1) Proposition. hn = 2(Sn; 71, P2;--- Pn). 
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Proor. We have 
H(t) = exp f P(t) dt 
= ep Det" 
r?1 
= II exp(p,t" /r) 
r>1 
=> (prt? )* 
Sieso rara,! 


Now the coefficient of t” in the right-hand side is made up of a sum of terms, one for each expression 
n = Yora,; that is, one for each partition A = 1°:29?..,+ n. The contribution which comes from 


the partition À is 
Peo ty a, 
rara! = = Te 3 
>i r>1 


which is precisely the contribution to 7(5,;71,p2,...) from permutations with cycle structure À. 
Summing over à gives the result, since the coefficient of é” on the left-hand side is just hy. 


Without proof, I will mention the analogous result for the alternating group. 
(15.7.2) Proposition. hy, + en = Z(Anjp1,P2,...) forn > 2. 
15.8. Exercises 


1. Use the Orbit-counting Lemma to find a formula for the number of ways of 
colouring the faces of a cube with r colours, up to rotations. Repeat for colourings 
of the edges, and of the vertices. 


2. Find the cycle index of the group S; acting on 2-subsets. Hence enumerate graphs 
on 5 vertices by number of edges. 


3. Prove Proposition 15.5.1 in the spirit of Proposition 15.5.2. (You will probably 
find Step 1 in the proof of the Cycle Index Theorem useful.) 


4. Show that the cycle index of a direct product of permutation groups, in its 
product action, can in principle be calculated from the cycle indices of the factors. 
Perform the calculation for $3 x $3. Hence enumerate the 3 x 3 matrices of zeros 
and ones, up to row and column permutations, by number of ones. 


5. Calculate the cycle index for 54 acting on (a) the ordered pairs of distinct elements 
of {1,...,4}, (b) the subsets of {1,...,4}. Hence enumerate the (a) loopless digraphs, 
(b) families of sets, on four points up to isomorphism, by number of edges or sets. 


6. Let the cyclic group of prime order p generated by g act on the set of all p-tuples 
of elements from {1,...,n} by the rule 

(1,...,%p)g = (fp, 21,-..,2p-1). 
By counting orbits, prove that n? =n (mod p). 


7. Let F(s1,..., Sn) be a polynomial in n variables s,,...,5,. For any polynomial 
a(t), let Fla] denote F(a(t), a(#?),...,a(t")}). Can you show that, if F[a] = 0 for 
all polynomials a(t) with non-negative integer coefficients, then F = 0? Deduce the 
principle used in the proof of (15.5.2) from this assertion. 
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Though the uncarved block is small 
No one in the world dare claim its allegiance. 


Lao Tse, Tao Te Ching (ca. 500 BC) 


Topics: Designs 

TECHNIQUES: Matrix and determinant techniques; Cauchy’s In- 
equality (the ‘variance trick’) 

ALGORITHMS: 

CROSS-REFERENCES: Steiner triple systems (Chapter 8), finite geome- 
tries (Chapter 9), regular families (Chapter 7), PIE (Chapter 5), 
Latin squares, [SDRs] (Chapter 6) 


Designs are a generalisation of Steiner triple systems. There is no hope of deciding 
the values of the parameters for which designs exist (as we did for STSs). We will 
develop just enough theory to resolve the question for small designs, and say a little 


about some general classes. 


16.1. Definitions and examples 

Let t, k, v, A be integers with t < k < v and À > 0. A t-(u,k,A) design, or t-design 
with parameters (v, k, à), is a pair (X,B), where X is a set of v points, and B is a 
collection of k-subsets of X called blocks, with the property that any t points are 
contained in exactly À blocks. 


EXAMPLES. 
1. A non-trivial Steiner triple system is a 2-(v, 3,1) design, by definition. 


2. A 2-(6, 3, 2) design is constructed as follows. Take the six points to consist of a 
pentagon and the point at the centre; the blocks consist of all triangles formed from 


these points which contain exactly one edge of the pentagon. (So, if the vertices of 
the pentagon are 1, ..., 5, and 0 is the centre, the blocks are 012, 023, 034, 045, 051, 


124, 235, 341, 452, 513.) The 2-design property is easily checked by inspection (Fig. 
16.1(a)). 
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3. Here is a 2-(7, 3, 2) design. The points are the vertices of a regular heptagon; 
the blocks are all the scalene triangles that can be formed from these points (Fi 
16.1(b)). It is easily checked that any edge or diagonal lies in just two scalene 
triangles, one the mirror image of the other. In fact, there are two shapes of scalene 
triangles, one the mirror image of the other; if we take the triangles of one shape 
we get a 2-(7, 3, 1) design (a Steiner triple system). pe 


Q Q 


) 
b) 
Fig. 16.1. 2-(6,3,2) and 2-(7,3,2) designs 
4, H i 3-(8 i i i 
aes’ è ri , 4, 1) design. The points are the vertices of a cube. There are three 
(i) a face (six of these); 

(ii) two opposite edges (six of these); 

(iii) an inscribed regular tetrahedron (two of these). 
Again, the proof is by checking (Fig. 16.2). 


bp py 


Fig. 16.2. A 3-(8,4,1) design 
If these examples suggest to i i 
you a connection between design theo d 
geometry, I have not wholly misled you! Now we develop some theory. ve 


(16.1.1) Proposition. The number 6 of blocks of a t-(v, k, A) design is given by 


i= a()/() 


Proor. A standard double count, of pairs (T, B), where T is a ¢-set of points, B 
a block, with T C B: there are (:) choices of T, each contained in À blocks; and 
there are b blocks, each containing (*) i-subsets. 
We always use b for the number of blocks. 
(16.1.2) Proposition. Let (X,8) be a t-(v,k,A) desi i 
s-subset of X. Let X' = ENS. ans (OBA) designe Given s < f let S bea 
B' = {B\S:SCB,BEB} 


(ie, take all blocks containing S 
, g 5, and remove S from them). T. 1 BY | 
(t — s)-(u — s, k — s, à) design. em). Then (X',B') is a 
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PROOF. There are v — s points, and each block contains k — s of them. Let Y bea 
subset of X \ S of size t — s. Then Y U S is a t-subset of X, so lies in à blocks in 
B; removing S, we get à blocks of 8’ containing Y. 

The design (X’, B’) is called the derived design of (X , B) with respect to the set 
S (or, with respect to the point z, if S = {x} is a singleton). 


(16.1.3) Corollary. For s < t, a t-(v,k, A) design is also a s-(v, k, As) design, where 


v—s k-s 
wah) 
Proor. À, is the number of blocks of a (t — s)-(v — s, k — s, A) design. 


EXAMPLE. Consider the 3-(8, 4, 1) design we constructed earlier. If we choose one 
point of this design and remove it from all blocks containing it, we get a 2-(7, 3, 1) 
design, i.c., a STS of order 7: 


Fig. 16.3. A derived design 


16.2. To repeat or not to repeat? 


We have defined designs in such a way that the blocks form a set of k-subsets of 
the point set; that is, each k-set is either a block or not. There is, however, a more 
general notion, in which a given subset is allowed to occur more than once as a 
block. If the ‘multiplicity’ of B is p, then B contributes y to the total number of 
blocks containing each of its t-element subsets.’ 
In statistical design, where this notion first arose, nothing is lost by allowing 
so-called repeated blocks. Imagine that we are testing a number v of varieties of 
fertiliser. In each experimental trial, we can take k of these varieties and compare 
them. In order to evaluate the results, it is desirable that each pair of varieties 
should be compared in the same number (À, say) of trials. So the experimental 
design should be a 2-(v, k, 4) design. But the experiment will be just as effective, and 
the cost will be less, if we can use the same k-set as a block more than once: the 
trial need only be performed once, and the results repeated the appropriate number 


of times. 


t Hughes and Piper call these ‘designs’ t-siructures, 
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What we lose, in fact, is an interesting mathematical problem. Of course, if B 
consists of all the k-subsets of X, then (X, B) is a t-design for any t < k, a so-called 
trivial design. Thus, for a non-trivial design, we require that some k-subset does not 
occur as a block. Now we can prove the following existence theorem with elementary 
linear algebra: 


(16.2.1) Proposition. Let t, k,v be given with t < k < v — t. If repeated blocks are 
allowed, then there exists a non-trivial t-(v,k,) design, for some À. 


PROOF. We define a matrix M = (mr x) as follows. M is a (£) x £) matrix, whose 
rows are indexed by the t-subsets of X = {1,...,v}, and whose columns are indexed 
by the k-subsets; the (T, K) entry m7,x is equal to 1 if T C K, and 0 otherwise. 

Since t < k < v — t, we have ©) < ( , 80 M has more columns than rows. 
Thus, the columns of M are linearly dependent over Q: there are rational numbers 
ax, for K a k-subset of X, such that Saxc(K) = 0, where c(K) is the column 
indexed by K. Multiplying up by the least common multiple of the denominators 
of these rationals, we can assume that all ax are integers. Clearly some are positive 
and some negative; let —d be the least. Now ax + d > 0 for all K, and ax +¢d=0 
for some K. 

Consider the ‘design’ in which the block K is repeated ax + d times. (Thus, 
some k-sets do not occur; the others occur a positive integral number of times.) We 
claim that this is indeed a ¢-design. Take a ¢-subset T. To find the number of blocks 
containing T, we add the multiplicities of the k-subsets K for which myx = 1. This 
number is 


—t 
E mala +d) = D maxed = (77 ta 


|K|=k IK]=k 


the first inequality because > mr,xag = 0 (we chose a linear dependence relation 
between columns), the second because T lies in (z) subsets of size k. 


kt 
So we have a ‘t-design’ with \ = (ct) d. 


On the other hand, if we do not allow repeated blocks, the existence question 
for t-designs is much more difficult, Only in the last few years has it been shown 
by Luc Teirlinck that non-trivial ¢-designs exist for all values of t; his designs have 
k=t+1 and v—t divisible by a quite rapidly growing function of t. So the existence 
question is far from settled! 

Note that 1-designs (without repeated blocks) exist for all ‘feasible’ parameters 
— this was shown in Section 7.4.2 So we concentrate on the cases t > 2. For t = 2, 
a powerful existence theory has been developed by Richard Wilson. From (16.1.2) 
we see that a necessary condition for the existence of a 2-(v,k, A) design is that the 
numbers r = (v — 1)À/(k — 1) and b = v(v — 1)A/k(k — 1) are integers; in other 
words, 

(v —1)A=0 (mod k- 1), 
v(v—1)A=0 (mod k(k—1)). 


? These designs were called regular families in Chapter 7. 
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For Steiner triple systems, these necessary conditions assert v — 1 = 0 (mod 3) and 
o(v —1) =0 (mod 6); we saw in Chapter 8 that these conditions are also sufficient, 


Wilson showed: 


(16.2.2) Theorem. There is a function f(k, A) such that, ifv > f(k,\) and the above 
necessary conditions are satisfied, then a 2-(v,k, À) design (possibly having repeated 
blocks) exists. 


Wilson’s proof is a mix of direct and recursive constructions, like the existence 
proof (8.1.2) for Steiner triple systems but rather more complicated. In the case 
À = 1, of course, the resulting designs have no repeated blocks. Nobody has 
succeeded in proving a similar theorem for higher values oft. 


In the remainder of this chapter, we assume no repeated blocks. 


16.3. Fisher’s Inequality 


We are most interested in 2-designs. By (16.1.2), a 2-design is also a 1-design; that is, 
a point lies in a constant number r of blocks. Now we have r(k —1) = (v—1)A (the 
formula for r = Ay from (16.1.3)), and vr = bk (applying (16.1.1) to the 1-(v,k,r) 
design). From these, the result of (16.1.1), viz. b(t) = a), follows. 


An important result about 2-designs is Fisher's Inequality:* 


(16.3.1) Fisher’s Inequality. 
In a 2-(v,k, À) design, b > v (ie. there are at least as many blocks 
as points). 


PROOF. Consider first the case À = 1. Take a point z and a block B with z g B. 
For each y € B, there is a unique block B, containing z and y; all these blocks are 
different. (For if B, = B,, then B, and B are blocks containing y and z, and so are 
equal; but z € By, z ¢ B, a contradiction.) So the number r of blocks through z is 
at least the number k of points on B, ie, r > k. Since br = vk, it follows that b > v. 


For the general case, we offer two proofs, the first by linear algebra, the second 

illustrating a very useful counting argument, variously called the variance trick or 
Cauchy's Inequality.’ 
First Proor. Let X = {21,...,2v}, B = {Bi,..., Bo}, and let M be the v x b matrix 
whose (i,j) entry is 1 if z; € B;, 0 otherwise. M is called the incidence matrix of 
the design. The use of incidence matrices introduces algebraic methods into design 
theory, to good effect. 


3 R. A. Fisher was one of the most influential statisticians of the twentieth century. In accordance 
with the remarks in the last section, his inequality is bad news for statisticians: it showa that, to 
achieve balance between the treatments, at least as many trials are required as the number of varieties 
being tested. 

4 In geometric language, this inequality asserts that the inner product of two real vectors does not 
exceed the product of their lengths. 
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Oran 1. MM" is the v x v matrix with diagonal entries r and off-diagonal entries 
. For 


è b 
(MM" Jis = J (Mul M) = Z(M) Mes, 
j=1 j=l 
which is the number of blocks containing z; and z4. Thi ber is r if? = 
A if i # k, proving the claim. i p Ths number is r EES £ and 
Cram 2. det( MMT) = rk(r — Aj? 
We use the fact that adding a multiple of one row (or column) to another 


doesn’t change the determinant, while iplyi 
3 multiplying a row (or column) b 
c multiplies the determinant by c. So we have umn) by a constant 


r 


Aro À 


det( MM") = det 

Male 

r+ DA r+e(v-1A -= r+(v- 1) 

= det r _ a 
À 


= (r + (v — 1)À) det 


= rk det : 
0 
= rk(r _ syd. 


(The second equality is obtained by adding all other rows to the first, and the fourth 
by subtracting À times the first row from all other rows and using r + {(v-l)A=rk,) 
Hence MM" is non-singular. Since it is v x v, its rank is v. But if b < v then 
rank(M) < b < v, and so rank{ MM") < v, a contradiction. So b > v. ' 


SECOND Proor. This proof just involves counting, but has the advantage that it 
more easily gives us information about what happens when the bound is met. 


Let B be any block. For i = 0, k, let r; d 
t e’ By i denote the number of blocks B’ 4 B 
for which |B N B’| = i. Now we have the following equations: 7 


k 
yon =b- 1, 
i=0 
k 
Dini = k(r = 1), 
i=0 
k 


> ti — 1)n, = k(k = 1)(A— 1). 


+=0 
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(The first equation simply counts blocks different from B. The second counts pairs 
(x, B') where x € B and B' # B with z € B’: for each of the k points of B, there 
are r — 1 further blocks containing it. The third equation counts triples (x1, 22, B’), 
where 21,22 are two points of B and B’ is another block containing both: there are 
k choices for 21, k — 1 for x2, and À — 1 further blocks containing both.) From these 
equations, we obtain 


Sin = k(r—-1)+k(R-1(A-1), 


and hence 


k 
Le — i)’nj = (b — 1)a? — 2k(r — Lx + (k(r — 1) + k(k — D(A — 1), 


where z is an indeterminate. This equation defines a quadratic function of z. From 
the left-hand side, we see that it is positive semi-definite, that is, its value is at least 
0 for all real z. Hence the discriminant of the quadratic form on the right must be 
negative or zero; that is, 


k(r — 1)? — (b — 1)k((r — 1) + (k — 1)(À — 1)) < 0. 


We simplify this by expressing it in terms of the parameters v, k,r, using the 
equations bk = vr and r(k — 1) = A(v — 1). Multiply by v — 1: 


k?(r — 1)?(v — 1) — (ur — k)(r — k)\(v — 1) — (vr — k)r(k - 1} < 0. 
After some manipulation, this becomes 
(k —r)r(v — k}? <0. 
Since r > 0 and (uv — k}? > 0, we must have k < r. Using ur = bk, this is equivalent 
to 6 > v, as required. 
What happens if equality holds? 


(16.3.2) Theorem. For a 2-(v,k,\) design with k < v, the following are equivalent: 
(a) b =v; 
(b) r = k; 
(c) any two blocks meet in points; 
(d) any two blocks meet in a constant number of points. 


PROOF. Since bk = vr, (a) and (b) are equivalent. Clearly (c) implies (d). We show 
that (b} implies (c), and that (d) implies (b). 

(d) = (b): If any two blocks meet in p points, then n; = 0 for i # p, and so 
Diu -— i)n; = 0. This means that x = p is a root of the quadratic form above, 
whose discriminant is thus equal to zero. Thus (k — r)r(u — k} = 0, or r = k. 

(b) = (c): If r = k, then b = v, and the quadratic form becomes (v — 1)z? — 2k(k — 
L)z + k(k — 1)à; using k(k — 1) = (v — 1)d, this becomes (v — 1)(z? — 22 + 7). 
Thus z = À is a root. Reversing the previous argument, we see that n; = 0 for i # À, 
and so every block meets B in exactly À points. Since B was arbitrary, (c) holds. 
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A 2-design satisfying the equivalent conditions of (16.2.2) is called a square 
2-design.® Its parameters (v, k, à) satisfy the equation 


k(k-1) =(v—1)d. 


The Bruck-Ryser-Chowla Theorem gives a necessary condition for the existence 
of a square design. 


(16.3.3) Theorem. Suppose that a square 2-(v,k, À) design exists. Then: 
(a) if v is even, k — A is a square; 
(b) if v is odd, the equation 


p= (k _ Aje? + (-1)@-Y?? dy? 
has a solution in integers x,y,z, not all zero. 


Proor. (a) From the first proof of Fisher’s Inequality (16.3.1), we see that the 
incidence matrix M of the design satisfies 


det(M)? = det(MM") = k?(k— A)". 


So |det(M)| = k(k — A)°-D/, This is an integer; 50, if v is even, then k — Aisa 
square. 

(b) The second part is a generalisation of the Bruck—Ryser Theorem (9.5.2).® 
The proof is almost identical, and I will not give it here. Instead, I show that, for 
projective planes, the conclusions of (9.5.2) and (16.3.3)(b) are identical. Let 4 = 1, 
and set k =n +1, v = n? +n + 1. The diophantine equation is 


z? = nr? + (1) 0+2, 


If n = 0 or 3 (mod 4), then n(n + 1)/2 is even, and the equation has the trivial 
solution z = 0, y = z; so the necessary condition is empty. If n = 1 or 2 (mod 4), 
then n(n + 1)/2 is odd, and the equation is y? + 27 = nz’. As explained in Section 
9.8, this has a solution if and only if n is the sum of two squares. 


The complement of a design (X, B) is (X, B), where 
B={X\B:BeB}; 
its blocks are the complements of all the blocks in B. 


(16.3.4) Proposition. The complement of a t-(v,k, à) design is a t-(v,v—k, À) design, 
where 
~ é t 
A= Do( Ja 
s=0 s 


5 Other terms used are symmetric design or projective design. 
E It waa proved by Chowla and Ryser a year after the Bruck-Ryser Theorem; hence the name. 
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Proor. Clearly every block of the complement has v — k points. Let 21,...,24 be 
points. Then the number of blocks of (X, B) containing 21,..., 24, is equal to the 
number of blocks of (X,B) containing none of 21,...,2:. We find this number 
by using PIE (Section 5.1). Let B; be the set of blocks containing x;, and for 
TC {1,...,t}, Br = Nier B: Then B; is the set of blocks containing z; for all i € J, 
so that |8;| = 2, if |I| = s. By PIE, the number of blocks containing none of 
Ziz... Ue is 
t t 
E DHe = Den ({) a. 

FC{1,..4£} s=0 s 

since there are ($) sets I C {1,...,¢} with |I| = s. 
Note that the complement of a square 2-design is a square 2-design. 


EXAMPLE. Consider the complement of a 2-(7, 3, 1) design. This design has 
No = b6=7, à =r =3, and Ag =A=1. So 

X=7-2-341=2, 
and the complement is a 2-(7, 4, 2) design. 


A design is called trivial if every k-set of points is a block. (The set of all 
k-subsets is the block set of a t-(v, k, e) design.) 


(16.3.5) Corollary. A t-(v,k,A) design with k > v — t is trivial. 


Proor. Let s = v — k. Then s < t, so our design is an s-design, by (16.1.3). So its 
complement is also an s-design, with block size s. Now some s-set is contained in a 
block of this design, and so this design is trivial, as is its complement. 
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Recall from Chapter 9 the projective geometry PG(n,q): its points are all the 
1-dimensional subspaces of a vector space V of dimension n + 1 over GF(q), and 
its i-flats are the (i + 1)-dimensional subspaces, for 0 <i < n. An i-flat can be 
identified with the set of points it contains. 


(16.4.1) Proposition. For 1 <i < n — 1, the points and i-flats in PG(n,q) form a 
non-trivial 2-([¥] E], ci), design, where Ei, is the Gaussian coefficient. 


Proor. The number of points, and the number of points in a block, are clear. Let 
x,y be points. Then (x,y) is a 2-dimensional subspace of V. The (i+ 1)-dimensional 
subspaces containing it are in one-to-one correspondence with the (7—1 }-dimensional 
subspaces of the quotient space V/{z,y}, by the Third Isomorphism Theorem. 


Note some special cases: 

(a) If i =n — 1, the design is square. Its blocks are called hyperplanes. 

(b) If i = 1 (the blocks are lines), then À = 1. In particular, if ¿ = 1 and q = 2, 
we have Steiner triple systems; these are the ‘projective triple systems’ of Chapter 8. 

(c) The intersection of these cases, where i = 1,n = 2, consists of the projective 
planes PG(2,q). More generally, any projective plane of order q (not necessarily 
Desarguesian) is a 2-(q? + q+ 1,g + 1,1) design. 
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In a similar way, affine spaces give designs, but with a twist: 


(16.4.2) Proposition. For 1 <1 < n, if either q = 2 ori > 1, the points and i-flats in 


n—-1 


AG(n,q) form a non-trivial 2-(4r, qi, [3],) design. If q = 2 andi > 1, it is even a 


; Inn? . 
3-(2", 2%, æ ) design. 

The proof is similar. The reason for the exclusion is that, if q = 2 and i = 1, 
then lines have just two points and the design is trivial. So any three points are 
independent, and span a plane; this is why 3-designs are obtained when g = 2. (For 
q > 2, some triples of points are collinear and others are not.) 


Once again, if 7 = 1 and q > 2, we obtain 2-designs with à = 1 (the ‘affine 
triple systems’ of Chapter 8 in the case q = 3); and, if : = 2 and g = 2, we obtain 
3-(2", 4,1) designs (Steiner quadruple systems). The case i = n — 1 (blocks are 
hyperplanes) is also interesting; we'll meet it again for q = 2 in Section 16.6. Also, 
any affine plane of order n (Desarguesian or not) is a 2-(n”,n, 1) design. Now (9.5.7) 
can be re-phrased in the terminology of design theory as follows: 


(16.4.3) Proposition. For any n, there exists a 2-(n? +n +1,n +1,1) design if and 
only if there exists a 2-(n?,n,1) design. 


16.5. Small designs 


If we are trying to decide for which values of the parameters a design exists, we 
may clearly ignore trivial designs; so we may assume that t < k < v — t. But, if a 
design exists, then so does its complement; so it is enough to resolve the question 
fort << k< de. 


Here is another construction of new designs from old, which doesn’t seem to 
have an official name; it is a sort of complementation but not to be confused with 
the operation defined above (before (16.3.4)). In this construction, the ‘no repeated 
blocks’ condition is crucial. Let (X, 8) be a non-trivial t-(v,k, à) design. Let B* be 
the set of all k-subsets of X which are not in B (not blocks of the original design). 
Then (X, B*) is a z-(v, k, (z) — à) design. For any t-set lies in e-z) sets of size k 
altogether; and À of these are blocks of the old design, the remainder blocks of the 
new design. 

In a non-trivial t-{v, k, 4) design, the value of à is at most equal to the total 


z) of k-subsets which contain a given ¢-subset. If \ = G), the design 


is trivial. Since the existence questions for \ and (23) — À are equivalent, we need 
only settle the question for 0 < A < L(t). 

We will illustrate by finding all parameters of non-trivial 2-designs with v < 8. 

Since 2=t < k < v—t, we have k > 3 and v > 6, So we have to consider the values 
v = 6,7,8. 
CASE v = 6. We have k = 3, and0 < à < -3 = 4. The equations r(k—1) = (v—1)A 
and ur = bk become 2r = 5A and 3b = 6r, or b = 2r = 5A. The first equation shows 
that À is even. So \ = 2. We have a design with these parameters (Example 2 in 
Section 16.1). 


number ( 
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CASE v = 7. We have k = 3 or 4, and by taking complements, it is enough to 
consider k = 3. Then 0< A < 33) = 5, so à = 1,2,3 or 4. From Example 2, 
there are designs with \ = 1 and \ = 2. By using the complementary sets of blocks, 
we obtain designs with À = 3 and 4 = 4. Now a short calculation shows that the 


complementary designs are 2-(7, 4, A) designs with A = 2,4,6,8. 
CASE v = 8. Now k = 3,4 or 5, and it is enough to consider k = 3 and k = 4. 


SUBCASE k = 3. We have 2r = TÀ, so À is even; and 3b = 8r, so r, and hence 4, is 
divisible by 3. However, 0 < À < 6, so no such design exists. It follows that there is 
none with k = 5 either. 


SUBCASE k = 4. This time we have 3r = 7A, and b = 2r; so À is a multiple of 3. Also, 
O<A< (=) = 15. So À = 3,6,9 or 12, and the last two values can be deduced 
from the first two. The 3-(8, 4, 1) design of Section 16.1, Example 4 is also a 2-(8, 
4, 3) design, by (16.1.3). The existence of a 2-(8, 4, 6) design is an exercise. The 


existence of the other two designs follows. 


REMARK. So far, whenever a parameter set for a 2-design satisfies the divisibility 
conditions and the trivial inequalities, a design happens to exist. But this pattern 
does not continue. Some designs are excluded by Fisher’s inequality; some by more 
sophisticated theoretical results; and some by exhaustive computer search. For other 
values, great ingenuity has been used to construct designs. 


16.6. Project: Hadamard matrices 


How large can the determinant of a matrix with entries of bounded size be? This 
question was considered by Hadamard. In this section, we prove Hadamard’s 
theorem and investigate its somewhat surprising connection with design theory. 


(16.6.1) Hadamard’s Theorem. Let A = (aij) be a n x n real matrix whose entries satisfy |a:;| < 1 
for all i, j. Then | det(A)| <n°/?. Equality holds if and only if az; = +1 for all i,j and AAT =ni. 


Proor, Our proof uses a geometric interpretation of the determinant. |det(A)| is the volume of the 
parallelepiped (in n-dimensional Euclidean space) whose sides are the rows of A. Now, if jag] < 1 
for all i,j, then the Euclidean length of any row is at most yn. The volume of the parallelepiped 
is at most the product of the edge lengths, with equality if and only if the edges are mutually 
perpendicular. The inequality follows; and equality holds if and only if each row has length Vn 
(so all its entries are +1 and Dye aj, = n for all i) and any two rows are perpendicular (so 


Lj- ijari = 0 for k # i). The two summations are equivalent to AAT =al. 


A matrix H which attains Hadamard’s bound is called a Hadamard matrix. Thus such a matrix 
has entries +1 and satisfies HHT = nI, We first derive a simple necessary condition on n for the 
existence of a Hadamard matrix. 


(16.6.2) Proposition. If a Hadamard matrix of order n exists, then either n = 1 or 2, or n = 0 
(mod 4). 


Proor. Observe first that, if n > 2, then any two rows of H agree in n/2 positions and disagree 
in n/2 positions, since their inner product is 0. So n is even. Also, if we change the sign of any 
column of a Hadamard matrix, the result is still a Hadamard matrix (most easily because | det(#)| 
is unchanged). By a series of such changes, we can arrange that all entries in the first row of H are 
equal to +1. 
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Now suppose that n > 3, and consider the first three rows, Let a,b,c,d be the numbers of 
columns in which the second and third rows have entries (+1, +1), (+1, —1}, (-1, +1) and (-1,—1) 


respectively. 


Considering inner products of the three pairs of rows, we have 


a+b=c+d=nf?, 
a+c=b+d=nfĉ?, 
a¢d=b+e=n/2, 


with solution a =b6=c=d=n/4,Sorisa multiple of 4. 

REMARK. We saw that the Hadamard property is invariant under changing signs of rows or columns. 
It is also invariant under permutations of rows or columns, and under transposition. Two Hadamard 
matrices are called equivalent if one can be transformed into the other by a sequence of operations 


of this type. 
Now we turn to constructions. There is a simple recursive construction, the so-called tensor 
product or Kronecker product. If A = (aij) and B are matrices, their tensor product is (in block form) 


au8 aB vee 
sons (m a2B ... |. 


It can be checked that (A @ B)(C @ D) = AC @ BD. Now, if H, and Hz are Haramard matrices of 
orders n;, nz respectively, then 


Th = 
(Hi Q H2)(A1 © H) = (M8 HHT & Hz) = HHT ® Hallig = mln, @ nein = nM2lninas 


so H, Q Ha is a Hadamard matrix. , 
In particular, taking H = t +), we obtain by successive tensor products a Hadamard matrix 


of order 2” for any n > 0. These matrices are said to be of Sylvester type. 


Another class of examples consists of the matrices of Paley type. Let g be a prime power 
congruent to —1 mod 4. Let P(q) = (piz) be the (g + 1) x (4 + 1) matrix with rows and columns 
indexed by the elements of the field GF(q) and a new symbol co, with 

if 7 = œ or j = 00} 

ifi = j $œ; 

if i — j is a non-zero square in GF(q9); 
if i — j is a non-square in GF(q). 


It can be shown that P(q) is a Hadamard matrix. (See Exercise 6.) 


Of course, this construction can be used in conjunction with the tensor product, to construct 
Hadamard matrices of all orders which are the product of a power of 2 and numbers of the form 
qi + 1, where q; are prime powers congruent to —1 (mod 4). In particular, the existence question 5s 
settled for all multiples of 4 less than 36. But this is not the end; here is a construction for order 36. 

Let L be a Latin square of order 6 (Chapter 6). First we construct a graph I, a so-called Tatin 
square graph. The vertices of T are the ordered pairs (i, j) for 1 < i, j < 6, regarded as the engi 
the Latin square. Two vertices are adjacent if the cells lie in the same Tow or column or con ni he 
same entry. Now H is the 36 x 36 matrix whose rows and columns are indexed by the vertices of l; 
the (x,y) entry is +1 if £ and y are adjacent in T, and —1 otherwise. Then H is a Hadamard matrix 


(Exercise 7). 
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It is conjectured that, for every positive integer n divisible by 4, there exists a Hadamard matrix 
of order n. The conjecture is still open, though many values have been settled by increasingly 
ingenious constructions. 


What has all this to do with designs? 


(16.6.3) Proposition. For n > 4, the following are equivalent: 
(a) there exists a Hadamard matrix of order n; 

(b) there exists a 3-(n, in, in — 1) design; 

(c) there exists a 2-(n —1,5n—1, in — 1) design. 


Proor. (a) > (b): Let H = (hij) be a Hadamard matrix of order n. As in (16.6.2), by changing signs 
of some columns, we may assume that all entries in the first row are +1. Now, for i = 2,...,n, let 
B} = {j : hi; = +1} and B7 = {j : hi; = —1}. Each of these 2(n — 1) sets has size $n. We claim 
that (X, B) is a 3-(n, in, įn — 1) design, where X = {1,...,n} and B= {B}, B; :i=2,...,n}. 

The proof of (16.6.2) shows, in effect, that, given any three rows of H, there are exactly n/4 
columns where they all agree. Dually, given any three columns (numbered j1, j2, ja, say), there are 
n/4 rows where they all agree. One is the first row; so there are 1n ~ 1 sets Bf (i = 2,...,n; € = +1) 
containing j1, j2, f3. 

(b) => (c): Take the derived design with respect to a point. 

(c) => (a): Let D be a 2-(n — 1, n — 1,40 — 1) design. Note that D is square. Let A be its 
incidence matrix. Now replace the entries 0 in A by —1, and border A with a row and column of 
+s (the first, say}; let H be the resulting matrix. 

Now any row of A has $n—1 entries 1, so any row of H agrees with the first in 1 +(in-1) = in 
positions. Also, any two rows of A have entries (1, 1) in in — 1 places, and (0,0) in in places (since, 
by (16.3.4), the complement of D is a 2-(n — 1, }n,4n) design); so the corresponding rows of H 
agree in 1 + (łn — 1) + jn = 4n places. Thus HHT = nl. 


The designs arising in this theorem are called Hadamard designs. The designs of points and 
hyperplanes in projective and affine spaces over GF(2) are Hadamard 2-designs and 3-designs 
respectively (check the parameters given in Section 16.4 to see this); the corresponding Hadamard 
matrices are of Sylvester type. 


16.7. Exercises 


1. An extension of a t-(v, k, À) design (X, B) is a (t+1)-(v +1, k +1, A) design (Y,C) 
with a point y such that its derived design with respect to y is isomorphic to X. 
Prove that a necessary condition for a t-(v, k, À) design with b blocks to have an 
extension is that v + 1 divides b(k + 1). Hence show that, if a projective plane of 
order n > 1 has an extension, then n = 2,4 or 10.” 


REMARK. Each of the (unique) projective planes of orders 2 and 4 has a (unique) 
extension. We saw in Chapter 9 that the non-existence of a projective plane of order 
10 was established by a massive computation. In fact, a relatively small part of this 
computation showed that no projective plane of order 10 could have an extension, 
some years before the non-existence was proved. 


2. Prove that, up to equivalence, there is a unique Hadamard matrix of each of the 
orders 4, 8, 12; and prove that the corresponding Hadamard designs are unique up 
to isomorphism. 


7 This result is due to Dan Hughes. 
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3. Let H be the (unique) Hadamard matrix of order 12. Let X be the set of 
columns of H. For 1 <i <j < 12, let B} be the set of columns where the 
i and j" rows agree, and Bj the set of columns where they disagree. Let 
B = {B}, B} :1 <i <j < 12}. Prove that (X, B) is a 5-(12, 6, 1) design. 

4. Show that the construction of the preceding problem, applied to an arbitrary 
Hadamard matrix, always gives a 3-design (if the order of the Hadamard matrix is 


greater than 4), but is a 4-design only if n = 12. 


5. A character y of an abelian group A is a homomorphism from A to the 
multiplicative group of non-zero complex numbers. The character table of A is the 
matrix whose (i,j) entry is the value of the i character on the j** element of A 
(for some ordering of the group elements and the characters). Show that, it A is a 
direct product of cyclic groups of order 2, then its character table is a Hadamard 
matrix of Sylvester type. 

6. Let P(g) be a Hadamard matrix of Paley type. It already has a row and column 
of +1s, so we can read off the corresponding Hadamard 2-design: its points are the 
elements of GF(q), and its blocks are the translates of the set of non-zero squares. 
Show directly that this is a Hadamard 2-design, and deduce that P(q)is a Hadamard 


matrix. 
7. Prove that the Latin square construction gives a Hadamard matrix. 


8. Let (X, B) be a Steiner triple system of order 15. For each triple B € B, let S(B) 
be the set of all triples equal to or disjoint from B. Prove that 


(8, {S(B) : B € BY) 


is a Hadamard 2-design with 35 points. 

9. Let (X,B) be a square 2-(v,k,\) design, where X = {1,...,v} and B = 
{B,,..-, By}. Prove that there is a Latin square of order v, having the property 
that the set of entries occurring in the first k rows and the tP column is B,, for 
i=1,...,v. [Such a square is called a Youden square.) 


10. Let D = (X,B) be a t-(v,k, A) design, and let 2),..., Tey be points of X. 
Suppose that p blocks contain all these points. Use PIE to show that the number of 
blocks containing none of 21,..., 2141 is N + (—1)'*1y, where N depends on t, v, k 
and A only. 

Deduce that, if t is even and v = 2k + 1, then 


(XU {y}, {BU {y}, X \B: Be B}) 


is an extension of D, where y is a point not in X. 
11. Find all possible parameters of non-trivial designs with 9 points. 


12. Show that the family of blocks of a square 2-(v,k,) design has at least 
k(k — AyO-? SDRs. 


17. Error-correcting codes 


... flame of incandescent terror 
Of which the tongues declare 
The one discharge from sin and error 


T. S. Eliot, ‘Little Gidding’ (1942) 


Topics: Error-correction, minimum distance, linearity, bounds 
TECHNIQUES: Linear algebra, projective geometry, number theory 
ALGORITHMS: Encoding, syndrome decoding 


CROSS-REFERENCES: Packing and covering (Chapter 8); projective 
geometry (Chapter 9); designs (Chapter 16) 


This chapter begins with an example involving ‘guessing’ a number on the basis of 
information about it, some of which is incorrect. It looks like a party trick, but 
in fact the ideas have great practical importance. Information of all kinds is sent 
through channels where it runs the risk of distortion: pictures of the planets in the 
solar system from space probes via radio links; musical performances via tapes and 
compact discs; instructions about how to build a living body via DNA molecules 
in genes; and so on. We can fancifully regard errors and distortion in the message 
as ‘nature lying to us’, and it is important to know how to identify and correct the 
errors. This is done by means of error-correcting codes, whose study could be seen 
as part of information theory but which has a high combinatorial content. 


17.1. Finding out a liar 


The panel game “Twenty Questions’, which we referred to in Chapter 4, involves one 
player trying to guess something thought of by the other, being allowed to ask twenty 
questions (each of which must have a yes-or-no answer) to gather information. It’s 
clear that 2”° different objects can be distinguished. Since this number is slightly 
greater than 10°, the game can be played with whole numbers, with the familiar 
opening gambit, ‘Think of a number less than a million’. 


What if the respondent lies? 

There is a simple scheme for guessing correctly a number between 0 and 15 with 
seven questions, where the respondent is allowed to lie once. The calculations can 
be done on the back of a small envelope, or in your head with a little practice. Since 
I can’t demonstrate it to you in this medium, T'I explain how it works, and you can 
try it out on someone else. 
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Give your respondent the following instructions. 


Instructions 
Think of a number between 0 and 15. 
Now answer the following questions. 
You are allowed to lie once. 


1. Is the number 8 or greater? 

2. Is it in the set {4,5,6,7, 12, 13, 14,15}? 
3. Is it in the set {2,3,6, 7,10, 11,14, 15}? 
4. Is it odd? 

5. Is it in the set {1, 2,4, 

6. Is it in the set {1,2,5, 

7. Is it in the set {1,3,4, 


The response is a sequence of seven ‘yes’ or ‘no’ answers. Writing 1 for ‘yes’ and 
0 for ‘no’, record it as a binary vector v of length 7. Now multiply v by the 7 x 3 


matrix 1 


0 
0 
0 
1 
1 
1 
1 


whose it row is the base 2 representation of i, for 1 <2 < 7. (The calculation 
of vH is done in GF(2).) The result is a binary vector w = vH of length 3. 
Alternatively, count the numbers of 1’s of v which occur in the sets {4,5,6,7}, 
{2,3,6,7}, {1,3,5,7} of coordinates respectively, recording 1 for odd and 0 for even 
in each case. Now either w = 0, in which case no lie was told; or w is the base 2 
representation of k, where 1 < k < 7, in which case the answer to the k*t question 
was a lie. Thus, the vector v of responses can be corrected. Then the first four 
entries of v form the base 2 representation of the chosen number. 


We note in passing that no fewer than seven questions would suffice, no matter 
how they were asked. For at the end, we know not only which of the 16 numbers 
was chosen, but also which question was answered incorrectly (if any). If, say, six 
questions sufficed, then we'd have identified one of 16-7 = 112 possibilities with 
only 6 questions, a contradiction since 112 > 26, (The factor 7 is for the 6 possible 
positions of the lie and the possibility that no lie was told.) Note that, with 7 
questions, we have distinguished 16 - 8 = 27 events; so we succeed with nothing to 
spare, (no wasted information is generated). 

Why does it work? 

Let C = {eo,€1,..., Cis} be the set of sixteen 7-tuples of zeros and ones which 
would be generated by truthful responses to the questions for each of the numbers 
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0,..., 15. For example, the number 13 generates 
ex3 = (1,1,0,1,0,0, 1). 


The crucial fact, which can be verified by checking all sixteen cases — there are 
simpler ways — is that, for any c € C, we have cH = 0, where H is the matrix 
defined above. (Note that the matrix H has rank 3, so its null space has dimension 
4; thus C must be precisely the null space of H .) Let e; be the 7-tuple with 1 in the 
ith place and 0 in all other positions. Then, if the respondent chooses the number 
m and lies to the k*" question, the replies will form the vector cm + €x. (Adding ex 
changes the k*™ coordinate and leaves the others unaltered.) Now we compute 


(Cm + ex) = cmH + eH = eH, 


which is the k‘? row of H and so, by definition, the base 2 representation of the 
number k. So wė have located the error. Once we correct it, we know the vector Cm. 
On the other hand, if no lie was told, the response is just cm, and we find cmH = 0. 
Now, the first four questions we asked about m generate its base 2 representation; 
so we can read this off from the first four digits of cm, and then calculate m. 

Another important fact is that the correct response to the questions can itself 
be generated by linear algebra. Let vm be the base 2 representation of the integer 
m. Then you can check that cm = UmG, where 


(This follows from the form of the questions. The first four questions ask whether 
the first, ..., fourth digit in Ym is equal to 1. The fifth question asks whether 
positions 2, 3, 4 contain an odd number of 1’s, that is, whether their sum (mod 2) 
is 1. Similarly for the sixth and seventh.) 


The set C is an example of an error-correcting code. We observe that: 
e Any two members of C differ in at least three positions. 
For suppose, for example, that cm and cn differ only in positions i and j. Then 
Cm + €i = Cn + €j, and we couldn't distinguish between the possibilities ‘m chosen, 
lie to i** question’ and ‘n chosen, le to j th question’. Similar reasoning would apply 
if two members of C differed in only one position. 
Furthermore, since C is the null space of H (or the row space of G), we have: 
è The sum of two members of C is again in C. 
We say that C is linear. 
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In coding theory, it is customary to assume that information is to be sent in ‘words’ 
of fixed length n, each word being an n-tuple of ‘letters’ taken from an alphabet Q 
of size q. By far the commonest case in applications is that when q = 2, and the 
alphabet is taken to be GF(2) = {0,1}; but this is not essential. Throughout this 
chapter, n and q have these meanings. 
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We define Hamming space H(n, ¢) to be the set of all words of length n over the 
fixed alphabet Q of size q. The structure on H will be a metric or distance function. 
The motivation is that, if a word w is transmitted through a noisy channel, some of 
the letters in the word may become changed; the more letters that are changed, the 
further the received word is from the transmitted word. So we define the Hamming 
distance between words v, w to be the smallest number of errors which could change 
v into w; that is, the number of positions in which the entries in v and w differ. 


Formally, 
d(v,w) = |{t: v: £ w}. 


(17.2.1) Proposition. (2) d(v,w) > 0, with equality if and only if v = w. 
(b) d(v, w) = d(w, v). 
(c) d(u,v) + d(v,w) > d(u, w). 


PROOF. The first two assertions are obvious from the definition. For the third, we 
can argue informally as follows: it is possible to change u into w by changing it first 
into v, making altogether d(u,v) + d(v,w) coordinate alterations, so the smallest 
number of changes required does not exceed this number; or else, a more formal 
argument like this can be used. Observe that 


{iiu Aw} C {2:4 Av} U {22 u;, A wi}, 


since if u; # w; then certainly either u; Æ v; or v; # w; Now take the cardinality of 
both sides, using the fact that 


|AU B| = |A] + |B| -|AN B| < |A| + |B| 


to get the desired inequality. 


REMARK. A function d from X x X to the non-negative integers is called a metric on 
X if it satisfies conditions (a)—(c) of (17.2.1). The notion of a metric is an important 
unifying principle in mathematics; it is very likely that you have met it in analysis or 
topology, and we saw an application in the ‘twice-round-the-tree’ algorithm for the 
Travelling Salesman Problem in Section 11.7. The metric defined here on Hamming 
space is called, naturally enough, the Hamming metric; we also refer to the Hamming 
distance between two words. 


A code of length n over the alphabet Q is just a subset C of Hamming space 
H(n,q) which contains at least two words. The elements of the code are called 
codewords. The rationale is that we will perform error correction by restricting 
our transmissions to be members of the code C, rather than arbitrary words; if 
the members of C are sufficiently distinguishable (i.e, sufficiently far apart) then, 
assuming that not too many errors occur, the received word still resembles the 
transmitted word more closely than any other codeword, and so we can recover the 
transmitted word. The reason for assuming that there are at least two words is that, 
in a one-word code, we would know in advance which word was transmitted, and 
so no information could possibly be sent! Now suppose that there are m possible 
messages that we might want to send, say M3, ..., Mm. We encode a message by 
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associating a unique codeword with it; thus, we take a one-to-one function e from 
the set of messages to C, and encode M; as the codeword e(M;), which is then 
transmitted. Of course, this requires that the number of codewords is at least as 
great as the number of messages; indeed, we may assume that every codeword 
corresponds to a message (by ignoring those which don’t). How do we decode? 

What is suggested in the preceding paragraph is the concept of nearesi-neighbour 
decoding. If the word w is received, then we find the codeword c € C for which 
d(w,c) is as small as possible, and assume that the transmitted word was c, and the 
message was e~'(c). In general, this requires a search through all the codewords 
to find the nearest one to w, a very time-consuming procedure if the code is large! 
One of the themes of algebraic coding theory is that, for codes with more algebraic 
structure, the decoding procedure can be simplified a great deal. 

What if there is no unique nearest codeword? We should design the code so that 
this event is very unlikely, if it can occur at all. Then either choose randomly among 
the nearest neighbours of w, accepting the small chance of making a mistake; or 
ask for the message to be retransmitted. Which strategy we use depends on the 
situation. One important use of error-correction is in obtaining data and pictures 
from interplanetary space probes; here, the length of time taken by a signal means 
that re-transmission is out of the question. But, for commercial transactions between 
banks, the importance of correct information outweighs the cost of a small delay. 

For a positive integer e, we say that the code C is e-error-correcting if, given 
any word w, there is af most one codeword c such that d(w,c) < e. This means that, 
if a codeword is transmitted and at most e errors occur, then nearest-neighbour 
decoding will recover the transmitted word uniquely. A related parameter is the 
minimum distance d of the code: this is the smallest distance between two distinct 
codewords. 


(17.2.2) Proposition. A code with minimum distance d is e-error-correcting if and 
only if d > 2e +1. 


PROOF. Suppose that d > 2e + 1. If a word w lies at distance e or less from two 
different codewords c, and cz, then . 


d(c1,c2) < d(e, w) + dlw, c2) < e +e = 2e, 


a contradiction; so C is e-error-correcting. 

Conversely, suppose that d < 2e, and let c}, c3 be codewords at distance d. Set 
f = |d/2|; then f < e and d— f < e. We can move from c; to cz by changing d 
coordinates one at a time. If w is the word obtained after f changes, then we have 
d(c, w) = f < e and d(c, w) = d — f < e; so C is not e-error-correcting. 


For example, the code which was used for the trick in the last section has 
minimum distance 3 and is 1-error-correcting. 


We see that, for good error-correction properties, we require large minimum 
distance. Also, we want the code to have as many words as possible, since the 
number of words limits the number of different messages that can be sent, and 
hence the rate of transmission of information. These two requirements conflict. In 
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practice, there is a third requirement as well: the processes of encoding and decoding 
should not be too demanding in terms of computational power. (In the case of space 
probes, the encoding has to be done in real time by a small, low-powered electronic 
circuit; the incoming message can be stored and decoded by large computers, but 
we still want the results within hours rather than centuries!) 

The tension between the first two requirements can be formulated as a packing 
problem (compare Section 8.4). The bail of radius r and centre w in the Hamming 
space H(n, q) is the set 


B,(w) = {v € H(n,q): d(v,w) <r}. 


(17.2.3) Propositon. The code C C H(n,q) is e-error-correcting if and only if the 
balls of radius e with centres at the codewords are pairwise disjoint. 


Proor. The conclusion is just another way of saying that no word lies at distance e 
or less from two codewords. 


So we want to know the maximum number of balls of radius e which can be 
packed into Hamming space. 


17.3. Probabilistic considerations 


As we have already suggested, combinatorial coding theory starts from the assump- 

tion that (with a sufficiently high degree of certainty) at most a fixed number e of 
errors are made during transmission. This is at base a probabilistic statement. In 
this section, we take a superficial look at the probability theory involved, and state 
Shannon's Theorem. 

To simplify matters, we consider only the binary alphabet GF(2) = {0,1}. Words 
of fixed length n are transmitted through a channel. We make the following 
assumptions about the channel: 

e the probability that a 0 is changed to a 1 is equal to the probability that a 1 is 

changed to a 0; 

è this probability p is the same for each digit, and is less than 4; 

e the events that alterations occur to different digits are independent. 
A channel satisfying these assumptions is called a binary symmetric channel The 
assumptions simplify the analysis, but are not very realistic. For eaxmple, interference 
often comes in ‘bursts’! so that if one digit is incorrect then its successor is more 
likely to be incorrect also; and the error probability may not be constant (e.g, 
because of synchronisation problems, errors may be more likely at the start of a 
word). The assumption p < } is harmless, and clearly necessary. If p > Ł, then we 
just reverse each digit received and the error probability becomes 1—p; if p = }, then 
the received message is completely random, and no information can be extracted 
from it. 

We also make the assumption that all words of length n have an equal chance of 
being transmitted. Again, this is often false. Much information is sent by encoding 
letters, numerals, and punctuation symbols as 7- or 8-bit binary words, using codes 


1 A scratch on a compact disc could destroy a run of consecutive bits, for example. 
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such as ASCII In plain text, the words representing a space or the letter ‘e’ are 
disproportionately likely to occur. Nevertheless, there are encoding schemes which 
achieve equiprobability and have the beneficial side-effect of data compression. 

The mazimum likelihood decoding method works as follows. Given a received 
word w, we decode it to that codeword c such that Prob(c transmitted : w received) 
is maximised. In other words, we assume that the codeword sent is the one most 
likely to result in the given received word. 


(17.3.1) Proposition. Assume that all codewords are equally likely to be transmitted, 
and that the channel is binary symmetric. Then maximum likelihood decoding 
coincides with nearest-neighbour decoding. 


PROOF. In a binary symmetric channel, if d(c,w) = d, then d errors (in specified 
positions) change c to w; so Prob(w received : c transmitted) = p*(1 — p)". 
Moreover, Prob(c transmitted) = 1/|C| by assumption. So 


Prob(c transmitted : w received) = p*(1 — p)"-4(1/|C])Prob(w received), 


which is a decreasing function of d. So it is maximised when c is the codeword 
nearest to w. 


The rate of a code C of length n over an alphabet of size q is defined to 
be log,(|Cl)/n. (The motivation for this definition is that if, say, |C] = q*, then 
k-tuples of information can be encoded in a one-to-one fashion by codewords 
and transmitted as n-tuples; information is sent k/n times as fast as it would be 
without encoding, this being the price paid for error correction.) Shannon proved 
the following remarkable theorem. 


(17.3.2) Shannon’s Theorem 

Given a binary symmetric channel with error probability p for a 

single digit, 

(a) if R < 1 + plog, p + (1 —p)log,(1 — p) and e > 0, there is a 
code with rate at least R such that the probability of error 
in decoding a codeword by nearest-neighbour decoding is less 
than é; 

(b) this is best possible, that is, if R > 1+ plog, p+ (1—p) log,(1 —p), 
then the error probability of any code with rate R is bounded 
away from 0. 


What is even more remarkable is that the code in Shannon’s Theorem is 
constructed by picking the appropriate number of codewords at random! The 
number 1 + plog,p + (1 — p)log,(1 — p) is called the capacity of the channel; it 
represents the maximum rate for ‘error-free’ transmission. Shannon’s Theorem 
extends to a wider range of situations (arbitrary alphabet size and other channel 


characteristics). 
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It is important, however, to realise three important limitations in Shannon’s 
Theorem which mean that it is not the end of coding theory! 

è It is non-constructive; the code is constructed at random. This doesn’t help an 
engineer who wants an explicit example. 

e The length n tends to infinity as € — 0 or the rate tends to the channel capacity. 
Using nearest-neighbour decoding with an unstructured code, it is necessary to 
remember the entire received word before decoding can begin; so, in practice, n 
is bounded by the memory size of the decoder. 

è even for moderate lengths, nearest-neighbour decoding involves a search through 
{exponentially many) codewords, a very time-consuming process. 


17.4. Some bounds 


As we saw in Section 17.2, there are three desiderata for a good code: 

e high rate (large number of codewords); 

e good error-correction, which for us means large minimum distance; 

è ease of implementation. 
The third of these requires concepts from the theory of computational complexity 
for its proper discussion. Section 20.1 sketches the ideas, but I won’t give a full 
treatment here. Already we see that the first two requirements conflict; and much of 
the mathematical interest in the subject comes from this tension. It can be expressed 
in the form of a question: 


What is the size of the largest code of length n and minimum 
distance d over an alphabet of size q? 


This is the main problem of coding theory. Needless to say, the exact answer is 
known only in special cases. In this section I will prove a simple lower bound and 
several upper bounds. 


(17.4.1) Varshamov—Gilbert bound. Given n,¢,d, there is a g-ary code of length n 
and minimum distance d or larger, having at least 


“/ (Een) 


Proor. Recall from Section 17.2 the definition of a bail B, (c) of radius r in Hamming 
space H(n,q): it consists of all words w satisfying d(c,w) < r for a fixed word c 
(the centre of the ball). Now 


B= 5 (e — 1y, 


i=0 


codewords. 


For a word at distance i from c is obtained by choosing a set of i coordinate 
positions in which to make errors (in (") ways), and changing the symbols in these 


positions (each can be changed to any of the other q — 1 symbols in the alphabet). 
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So there are BE — 1)f words at distance ¢ from c, and the result is obtained by 
summing over 2. 

Now suppose that C is a code with minimum distance d or larger, and suppose 
that the union of all the balls By_i(c), for c € C, is not the whole of the Hamming 
space. Then we can find a word whose distance from each codeword is at least d. 
Adding it to C, we obtain a larger code, still with minimum distance d or larger. 

But the number of words lying in these balls does not exceed |C| - | Bu_1(c)|. So, 
if this product is less than q”, then C may be enlarged. So we can continue this 


enlargement at least until [C|- |Bs_i(c)| > g7, the required result. 


Note that 

è this is a lower bound, that is, it guarantees a code of the appropriate size; 

s the proof is not constructive; 

eè it is unlikely to be close to best possible (we hope that clever methods will 
produce much larger codes). 


Now we turn to upper bounds. We prove three bounds below. In each case, we 
ask the question: what does it mean if the bound is attained? The first bound bears 
a striking resemblance, both in statement and proof, to (17.4.1). 


(17.4.2) Hamming bound, or sphere-packing bound, Suppose that d > 2e+1. A q-ary 
code with length n and minimum distance at least d has at most 


“/(E(e-) 


codewords. 


Proor. By (17.2.3), if C has minimum distance at least 2e+1, then the balls of radius 
e with centres at the words of C are pairwise disjoint, and so contain |C]. |B,(c)| 
words. This number cannot exceed the total number q” of words. The result follows. 
(It should be called the bail-packing bound, but the word ‘sphere’ is often used instead 
of ‘bal? in coding theory.)? 

Note that the proof of (17.4.1) is a covering argument while that of (17.4.2) is a 
packing argument. 


(17.4.2a) Equality in the Hamming bound. A code C attains the Hamming bound 
if and only if every word in H(n,q) lies at distance e or smaller from exactly one 
word in C. 


This follows immediately from the proof. A code satisfying this condition is 
called a perfect e-error-correcting code. 


(17.4.3) Singleton bound. A g-ary code of length n and minimum distance d has at 
most g"~*+! codewords. 


2 In mathematical usage, a sphere is the surface of a ball. 


280 17. Error-correcting codes 


PROOF. Consider the first n — d + 1 coordinate positions. Two words of C cannot 
agree in all these positions, since they could then differ in at most the remaining d—1 
positions, and their distance would be at most d — 1. So the number of codewords 
doesn’t exceed the number of {n — d + 1)}-tuples. 


(17.4.3a) Equality in the Singleton bound. A code C attains the Singleton bound if 
and only if, given any n —-d+1 coordinate positions and any n — d+ 1 symbols from 
the alphabet, there is a unique codeword having those symbols in those positions. 


This is almost immediate from the proof. (To see that such a code does indeed 
have minimum distance (at least) d, note that, if two codewords have distance d — 1 
or less, then they must agree on n — d+ 1 positions, contrary to the uniqueness 
requirement.) A code satisfying this condition is called maximum distance separable, 


or MDS. 


(17.4.4) Plotkin bound. Let 8 = 1 — L, and suppose that d > On. Then a q-ary code 
with length n and minimum distance d has at most d/(d — @n) codewords. 


Proor. The argument is more elaborate than those for the earlier bounds. Let C be 
our code, with M codewords, which we imagine as written out in an M x n array 
whose rows are the codewords. We bound in two ways the number N of occurrences 
of an ordered pair of different symbols in the same column. 

First, note that any two rows have Hamming distance at least d, there are at 
least d columns in which the entries in these rows are different. So 


N > M(M — 1)d. (1) 


On the other hand, let z;; be the number of occurrences of the i symbol in 
the j** column. For each such occurrence, there are M — z;; rows where a different 
symbol occurs. So the contribution from this column is X}; #:;(M — zj). But we 
have Di1 ti; = M. This implies that 77, 2?, > M?/q, with equality if and only if 
each z; is equal to M/g. So we have 


q g 
Yo 2;;(M — Tij) = M? — > 23, 
i=1 i=l 


< M? — M’ fjq=0M. 
Summing the contributions of all n columns, we obtain 
N < n0 M’. (2) 


Combining (1) and (2), we see that M(M — 1)d < n@M?, giving M(d — 0n) < d. 
If d < On, this gives no information; but, if d > @n, we obtain Plotkin’s bound. 


(17.4.4a) Equality in the Plotkin bound. A code C attains the Plotkin bound if and 
only if 

(a) any two distinct codewords have distance d; and 

(b) each symbol occurs in a given position in the same number M/q of codewords. 
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Proor. The result was obtained by combining two inequalities; so, to meet the 
bound, we must meet it in each of these inequalities. From the proof of (1), we 
see immediately that equality in (1) is equivalent to condition (a) (which defines an 
equidistant code). We noted in the proof of (2) that equality holds if and only if 
z; = M/q for all i, 7. Note that this condition resembles that for equality in the 
Singleton bound. We now extract the common features. 


An orthogonal array of strength t and index À (with length n, over an alphabet 
of size q) is a set C of n-tuples of elements from the alphabet of size q, having 
the property that, given any ¢ distinct coordinate positions (say %,,...,¢:), and any 
t elements a:,...,a, of the alphabet (not necessarily distinct), there are precisely A 
members c of C with the property that they have these entries in these positions; 
that is, e; = a; for j = 1,...,¢. Notice that this definition has a similar ‘flavour’ 
to the definition of a ¢-design in the last chapter; there is a body of theory which 
can be developed for both orthogonal arrays and designs (including the divisibility 
conditions, derived designs, Fisher’s Inequality, etc.) 

In this language, we have: 

e a code attains the Singleton bound if and only if it is an orthogonal array of 

strength n — d+ 1 and index 1; 

è a code attains the Plotkin bound if and only if 

(a) it is equidistant with distance d; 

(b) it is an orthogonal array of strength 1. 


17.5. Linear codes; Hamming codes 


In this section, we see the benefits of giving codes more algebraic structure. We get 
simpler encoding and decoding algorithms, as well as some easy constructions of 
good codes. We take our alphabet to be the finite field GF(q). Then the Hamming 
space H(n,q) of all words of length n is an n-dimensional vector space over GF(q). 
We define a linear code to be a vector subspace of H(n, q). 

The weight wt(w) of a word w is the number of non-zero entries in w (this is 
just its distance from the all-zero word). The minimum weight of a code C is the 
smallest weight of a non-zero word in C 


(17.5.1) Proposition. (a) For any v,w € H(n,q), we have d(u,w) = wt(v — w). 
(b) The minimum distance and minimum weight of a linear code C are equal. 


Proor. (a) is clear from the definition, since v; — w; # 0 © v; £ w;. For (b), observe 
that any weight in C is a distance (since wt(w) = d(w,0)), and any distance is a 
weight (by (a); the linearity of C implies that v — w € C for all v,w € C). Note 
that, in general, finding the minimum distance of a code involves comparing all (3) 
pairs of codewords, but finding the minimum weight involves looking only at the N 
codewords. 


(17.5.2) Linear Varshamor—Gilbert bound. If q is a prime power, there is a linear 
code attaining the bound of (17.4.1). 
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Proor. Recall the proof of (17.4.1): as long as |C| < ¢"/|Bu-1(c)|, we can find a 
word w whose distance from any word in C is at least d, and adjoin it to C to 
form a larger code. In fact, if C is linear, then the subspace spanned by C and w 
still has minimum weight at least d. For a typical word in this space has the form 
c+aw, where c € C and a € GF(g). If a = 0, c £0, the weight is at least d, by our 
assumption about C. If a Æ 0, then 


wt(c + aw) = wt(—a7'e — w) = d(-a71e,w) 2 d, 


by the assumption on w (and using the linearity of C). 

Since the cardinality of a linear code is a power of g, this allows the Varshamov— 
Gilbert bound to be improved. To take an example, consider the case q = 2, d = 3, 
n = 15. By (17.4.1), there is a code with these parameters of cardinality at least 
215/(1 +15 + ()) = 270.8...; that is, at least 271. But (17.5.2) gives a linear code 
with at least this many words, and it must have cardinality at least 512. (In fact, 
we'll see soon that there is a code with cardinality 2048 but no larger.) 


How do we specify a linear code? Since it is a subspace, we can describe it 
by giving a basis, a set of & linearly independent words. It is convenient to take 
these words as the rows of a k x n matrix G, called a generator matriz for the code. 
In other words, G is a generator matrix of C if and only if its rows are linearly 
independent and its row space is C. 

Closely related to the row space of a matrix A is its null space, the set of words 
w such that wAT = 0. A code C can also be specified by giving a matrix (with 
linearly independent rows) whose null space is C. Such a matrix H is called a check 
matriz for C3 

Since the rank of a matrix (the dimension of its row space) and its nullity (the 
dimension of its null space) sum to n, the number of columns, we see that, if a linear 
code has length n and dimension k, then a generator matrix is k x n and a check 
matrix is {n — k) x n. 

(17.5.3) Proposition. Let G and H be matrices with linearly independent rows, having 


size k x n and (n — k) x n respectively. Then G and H are the generator and check 
matrices of a code if and only if GHT = 0. 


PRroor. Suppose that GHT = 0. Then every row of G lies in the null space of H, 
so the row space of G is contained in the null space of H. But both spaces have 
dimension k, so they are equal. The converse is shown by reversing the argument. 


There is a dot product defined on H (n, q), by the rule 


3 The name comes from the case q = 2, H = (1,1,...,1). For any word w, we find that wH is 
equal to the number of 1's in w (mod 2); that is, it is zero if w has even parity and 1 if w has odd 
parity. This is called a parity check. The code consists of all words of even weight; by evaluating the 
parity check, we can detect (but not correct) a single error. This is often used in serial communication 
between computers, where error probabilities are very low and the cost of a re-transmission is small. 
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(Note that, unlike the case for the Euclidean inner product, it can very well happen 
that v.v = 0 for some non-zero vector v. For example, if q = n = 2, consider the 
vector (1, 1).) 

Now the dual code C+ of a linear code C is defined to be 


C+ = {w € H(n,q) : v.w = 0 for all v € C}. 


It is a linear code, satisfying dim(C) + dim(C*) = n. Now a vector lies in the null 
space of a matrix if and only if its dot product with every row of the matrix is zero. 
So the null space of a matrix is just the dual of its row space. In other words: 


(17.5.4) Proposition. For any linear code C, the check matrix of C L is equal to the 
generator matrix of C, and vice versa. 


The generator and check matrices are not just of theoretical interest, but are 
crucial for the encoding and decoding of linear codes. Let C be a linear code of 
dimension k, with generator and check matrices G and H respectively. 


ENCODING. Since |C| = q*, each k-tuple of digits can be encoded as a word of C in 
a one-to-one way. The simplest way to do this is as follows. Let v be an arbitrary 
k-tuple. Then vG is an n-tuple, and is a member of C, since it is a linear combination 
of rows of G (with the elements of v as coefficients). The linear independence of 
the rows shows that the map v + vG is a bijection from GF(g)* to C. Moreover, 
in the case q = 2, this matrix multiplication can be performed very efficiently by 
small, low-power circuits (one of our requirements for efficient encoding, especially 
for space probes). 


DECODING. This is a little more difficult. Suppose that C is e-error-cortecting (that 
is, its minimum distance is at least 2e + 1), Let the codeword c be transmitted, 
and the word w = c + u be received, where u is the ‘error’ (and we assume that 
wt(u) < e). The idea of the decoding procedure is that, rather than remove the error 
u to reveal the transmitted word c, we remove c to reveal u; this is more reasonable, 
since we know more about u, and it is equivalent since knowing u, we can find c by 
subtraction. 

We calculate the vector wH" € GF(q)""* — this is called the syndrome of 
w. Since cHT = 0, the syndrome is equal to uH", that is, it depends only 
on the error pattern. Moreover, distinct error patterns have distinct syndromes. 
For, if wt(u,),wt(u2) < e, then wt(u; — u2) < 2e, by the triangle inequality. 
But, if w HT = uH", then u, — uz is in the null space of H, which is C, so 
wt(tu1 — u2) > 2e + 1, a contradiction. 

Hence, in principle, the error pattern u can be recovered from its syndrome 
uHT. In practice, this is the difficult part; linear algebra doesn’t help, and we might 
use a look-up table.4 Once u is found then, as already described, we obtain ¢ by 
subtracting u from w. 

This method is known as syndrome decoding. 


4 This would be a table of errors and corresponding syndromes, but ordered by syndrome, so that 
we can quickly find the error producing any syndrome. 
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Various information about a code can be read off from these matrices. The most 
important example of this is as follows. 


(17.5.5) Proposition. A linear code has minimum weight d or greater if and only if 
any d—1 columns of a check matrix for the code are linearly independent. 


Proor. The vector cH” is a linear combination of the columns of H, with coefficients 
the elements of c. (Strictly, it is the transpose of this, since cH" is a row vector.) So 
any word of weight m in C gives rise to a linear dependence between m columns of 
H (with all coefficients non-zero}, and conversely. So the minimum weight of C is 
the smallest number of columns which are linearly dependent. 


We examine further a special case. A linear code is 1-error-correcting if and only 
if it has minimum weight at least 3; by (17.5.5), this occurs if and only if any two 
columns of H are linearly independent. In other words, we require that no column 
of H is zero, and no column is a multiple of another. For fixed column length m, 
let us find the largest such matrix possible. 

Define an equivalence relation on the set of all. non-zero column vectors of 
length d, where two columns are equivalent if and only if one is a scalar multiple 
of the other. The columns of our matrix H must belong to different equivalence 
classes. How many classes are there? There are gł — 1 non-zero vectors; each one has 
q—1 non-zero multiples, so each equivalence class has size q — 1, and the number of 
classes is (g? — 1)/(q—1). [You should recognise this from Chapter 9 as the number 
of points in the projective space PG(d — 1,4); why are these numbers equal?] 

So let H be a d x (gf — 1)/(q — 1) matrix whose columns are representatives of 
the equivalence classes of non-zero vectors, and let C be the code with check matrix 
H. Then C is the Hamming code of length n = (q4 — 1)/(q — 1). [Note that q and 
n determine d; the matrix is not unique, but the only ambiguity is in the choice of 
representatives and their order; so all codes obtained are equivalent, in a sense to be 
defined.] 


(17.5.6) Theorem. Hamming codes are perfect j-error-correcting. 


Proor. This means that they attain the Hamming bound! Certainly it follows from 
(17.5.5) and the subsequent discussion that a Hamming code is 1-error-correcting. 
Now its length is n = (qf — 1)/(q — 1), and its dimension is n — d, so the number of 
codewords is 


q = qfi = A + nla- 1), 
and the right-hand side is the Hamming bound for e = 1. 


Syndrome decoding works especially well for Hamming codes. The syndrome is 
a d-tuple. If it is zero, then no error has occurred, If it is non-zero, then there is 
a unique column j and scalar œ such that the syndrome is avj, where vj is the j** 
column of H. Then the error occurred in position j, where a was added; and so 
subtracting «œ from the j** coordinate of the received word corrects the error. 
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If this sounds familiar, re-read the discussion in Section 17.1. The code used 
there was the binary Hamming code of length 7 = (2* — 1)/(2 — 1).5 In the binary 
case, the only non-zero scalar is 1, so the equivalence relation is equality, and the 
columns of the check matrix are all the non-zero binary triples, that is, the base 2 
representations of the numbers 1,...,7. If we arrange them in the obvious order, 
then the syndrome is the base 2 representation of the number of the position where 
the error occurred! You should now be able to generalise the method, and explain 
(for example) how to determine any chosen number between 0 and 2047 in 15 
guesses, if one lie is allowed. 


In general, how do we compute the check matrix from the generator matrix, or 
vice versa? The key is the following observation. Let A be a k x (n — k) matrix. 
Set G = (J, A) and H = (—A™ I,-x). Clearly G and H have ranks k and n — k 
respectively; and GHT = 0, so if G is the generator matrix of a code then H is the 
check matrix, and vice versa. Now suppose that we are given an arbitrary generator 
matrix G. By applying elementary row operations to it, we can put it into reduced 
echelon form (see Chapter 9). Moreover, elementary row operations don’t change 
the row space (i.e, the code with the given matrix as generator). If we are lucky, 
the reduced echelon form will have the shape (I A) for some A — this means that 
the leading 1’s occur in the first k columns. If so, we can read off the check matrix 
(—AT I) directly. In general, we have to apply some permutation of the columns to 
bring the leading ones into the first k columns, write down the check matrix H, and 
then apply the inverse permutation to H. 


17.6. Perfect codes 


Perfect codes had great importance early in the history of coding theory, when two of 
the pioneers, Hamming and Golay, found some interesting examples. As time went 
on and very few further examples were found, engineers lost interest and turned 
to larger and more flexible classes of codes. But perfect codes are unexpectedly 
important to mathematicians. 


We begin with perfect 1-error-correcting codes, and consider first codes over 
prime-power-size alphabets. 


(17.6.1) Proposition. Let q be a prime power. 

(a) A perfect 1-error-correcting code over an alphabet of size q has length (q? — 
1)/(q — 1) for some integer d > 1. 

(b) A linear perfect 1-error-correcting code is a Hamming code. 


Proor. (a) Such a code C satisfies |C| = g"/(1 + x(q — 1)); so 1 + n(q — 1) divides 
q”. Suppose that g = p* where p is prime, and write 1 + n(g — 1) = gp’, for some 
integers d and b with 0 < b < a. Then p’ = g°p* =1 (mod q -— 1). Since 1 < p° < q, 
we must have p° = 1, whence 1+n(q—1) = gf, and n = (q*—1)/(¢—1), as required. 

(b) Let n = (4f — 1)/(q— 1). Now |C| = g"/(1 + n(q — 1)) = 9"~4, so a check 
matrix for C is d x n. But C is 1-error-correcting, so the columns of H are pairwise 


5 The matrix H used there is actually the transpose of what we defined as the check matrix. 
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inequivalent, with respect to the equivalence relation defined before (17.5.6). Since n 
is equal to the number of equivalence classes, we have one column from each class, 
and H determines a Hamming code, according to our definition. 


Some examples of non-linear codes with the same parameters as Hamming codes 
are known. 


For non-prime-power alphabets, almost nothing is known. The next result is the 
sum total of our knowledge. 


(17.6.2) Proposition. There is no perfect 1-error-correcting code of length 7 over an 
alphabet of length 6. 


PRrooF. Suppose that C is such a code, over the alphabet {1,...,6}. We have 
[C| = 67/(1 + 7.5) = 6° = 67-3+1_ Thus C, as well as being perfect, is also MDS (it 
meets the Singleton bound). By (17.4.3a), we see: 
T {Siren a,...,a5 € {l,...,6}, there are unique 
elements ag, a7 such that (a),...,a7) € C. 

Now fix a,,@2,a3, and define two 6 x 6 matrices M = (m;;),N = (n,;) by the 
tule that m;; = k and n,; = l if and only if (a1, a2,03,2,3, k, i) € C. It follows easily 
from (+) that M and N are two orthogonal Latin squares of order 6, contradicting 
the proof of Euler’s conjecture by Tarry (see Chapters 1, 9). 

Unfortunately, since the generalised form of Euler’s conjecture is false in all 
other cases, this argument really is a one-off! 


Now we consider e-error-correcting codes for e > 1. The first case is q = 2, 
e = 2. Such a code C satisfies 


IC] = | (1+n+ (3) = 2 f(r? tn 42). 


So n? +n +2 = 2* for some a. Multiplying by 4 and setting z = 2n + 1,y =a +2 
we find 
ett =O", 

This is Nagell's equation, named after the mathematician who found all the solutions 
of this equation in integers® (in 1930, some time earlier than the development 
of coding theory). The solutions are (x,y) = (+1,3), (+3,4), (+5, 5), (£11,7) and 
(+181, 15). In our situation, the code C is 2-error-correcting, and so has minimum 
distance 5; so n > 5, and x > 11. Thus, only the lengths 5 and 90 are possible. We'll 
see that there is a unique such code of length 5, one of an infinite (but trivial) class, 
and no such code of length 90. 


The repetition code of length n over the alphabet A is the simplest code imag- 
inable, consisting of all words (a,a,...a) of length n for a € A. If |A| = 2 and 
n = 2e + 1, then C is perfect e-error-correcting: any word w of length n has either 
more zeros than ones (and is closer to (0,0,...,0)), or more ones than zeros (and is 
closer to (1,1,...,1)). 


6 The solution of Nagell’s equation is about at the limit of what can be covered in an undergraduate 
course in algebraic number theory. See I. Stewart and D. Tall, Algebraic Number Theory (1987). 
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The non-existence for length 90 will be shown later. 


Now consider e = 3, q = 2. The code C satisfies 


e=” / (t+ n+ (3) +(5)). 


This gives a Diophantine equation a bit like Nagell’s, of the form f(n) = 3-2’, 
where f is a cubic polynomial. We seem to be worse off than before; but in fact this 
is not so, since the polynomial f happens to factorise. Putting m = n + 1, a little 
manipulation gives 

m(m? — 3m +8) = 3.2%, 


where m > 8 (since n > 7 for a 3-error-correcting code). 
Now there are two cases: 


CASE L. m = Ż and m?—3m+8 = 3-2* for some b,c. If m > 16, then m?—3m+8 = 8 
(mod 16), so m? — 3m + 8 = 24, which is impossible. So m = 8, n = 7, and we have 
a repetition code. 


CASE 2. m = 3-2° and m? — 3m +8 = 2° for some b,c. As before, if m > 48, then 
m? —3m+8=8 (mod 16), so m? — 3m +8 = 8, a contradiction. So m = 12 or 24. 
In the first case, m? — 3m + 8 = 116 is not a power of 2. So the possibility m = 24, 
n = 23 remains. Golay discovered a perfect binary code with these parameters, 
which was later shown to be unique (up to a suitable definition of isomorphism). 
This is the so-called binary Golay code. 


Golay also discovered a ternary perfect 2-error-correcting code of length 11, 
which is also unique. Now, to cut a long story short, Tietäväinen proved the 
following result. 


(17.6.3) Tieta&vainen’s Theorem. For e > 1, the only perfect e-error-correcting codes 
of length n over alphabets of prime-power size q are the binary repetition codes (with 
q = 2, =2e+41) and the binary and ternary Golay codes (with q = 2, e = 3,n = 23 
and q = 3,e = 2,n = 11 respectively). 


The Golay codes, with their related designs, lattices, and groups, are of enormous 
importance, which can only be hinted at here. The next result gives a connection 
between codes and designs. For ease of exposition, we consider linear codes only. 
The support of any word is the set of coordinate positions where its entries are 
non-zero. 


(17.6.4) Proposition. Let C be a linear perfect e-ertor-correcting code of length n 
over GF(q). Then the supports of the codewords of smallest weight 2e + 1 in C are 
the blocks of an (e + 1)-(n, 2e +1, (g — 1)*) design, each block repeated q — 1 times. 


Proor. That the minimum weight is 2e + 1 is clear. Now choose any set of e +1 
coordinates, and let w be any word whose support is this set. (There are (q — 1)°*? 
such words w.) There is a unique codeword c with d(c,w) < e Now c # 0, so 
wt(c) > 2e + 1. It follows that wi(c) = 2e + | and the support of c contains that of 
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w (and c agrees with w on its support). So C contains (q — 1)°+! words of weight 
2e +1 whose support contains the given e + 1-set. Now a non-zero scalar multiple of 
such a codeword has the same support; so there are (g — 1)* supports of size 2e + 1 
containing the given (e + 1)-set, each repeated g — 1 times. So we have a design with 
the stated parameters. 


REMARK. Linearity is only used here to show that the zero word is in C, and that 
each block is repeated equally often. Now let C be a binary perfect code, not 
necessarily linear. By translation in H (n,2), we may assume that 0 € C. Then the 
‘design’ has \ = 1, and the question of repeated blocks doesn’t arise. Thus, the 
conclusion of (17.6.4) holds for q = 2 without the assumption of linearity. 

This enables us to complete the discussion of binary perfect 2-error-correcting 
codes. The possibility of such a code of length 90 was left open; but its existence 
would imply that of a 3-(90, 5, 1) design, in which (by (16.1.3}) the number of blocks 
containing two points is 88/3, a contradiction. 


According to this result, the binary and ternary Golay codes (which are both 
linear) give rise to 4-(23, 7, 1) and 3-(11, 5, 4) designs. The latter is actually a 4-(11, 
5, 1) design. Moreover, these designs can be extended to 5-designs. This is done by 
extending the codes by an overall parity check (a new coordinate position such that 
the entry in that position in any word is chosen so that the sum of all its entries 
is zero). The supports of words of minimum weight in the extended codes form 
extensions of the designs: they are a 5-(24, 8, 1) design and a 5-(12, 6, 1) design. 
These were the first 5-designs known; their automorphism groups are the Mathieu 


groups Mo, and Miz, the first of the ‘sporadic’ simple groups to be discovered, 

and the only 5-transitive permutation groups apart from symmetric and alternating 
7 

groups. 


17.7. Linear codes and projective spaces 


The basic properties of a code can be expressed in terms of Hamming distance. So 
it is reasonable to call two codes equivalent if one can be transformed into the other 
by an isometry (a distance-preserving transformation) of Hamming space H (n, gq). It 
can be shown that any isometry can be built out of two kinds of transformation: 
(a) permutation of the symbols appearing in any coordinate position, where the 
permutations applied to different coordinates are independent; 

(b) permutations of the coordinates. 

These generate the wreath product of the symmetric groups S4 (on symbols) and Sn 
(on coordinates), in its product action as defined in Chapter 14 — see Exercise 4. 


However, if we are interested in linear codes, then this definition of equivalence 
is too wide, since the symbol permutations don’t in general preserve the property 
of linearity. Assuming that the alphabet is a field F, we should only allow in (a) 
the multiplication of each coordinate by a non-zero scalar, the scalars applied to 
different coordinates being independent; we can compose this with an arbitrary 


7 The groups were constructed by means of generating permutations by Mathieu, half a century 
before the designs were found by Skolem and Witt, which in turn predated the discovery of the codes 
by Golay. 
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coordinate permutation as in (b). Such a transformation is called monoméal, and the 
equivalence relation defined on linear codes is called monomial equivalence. (Thus, 
a monomial transformation is one represented by a matrix which has a single non- 
zero entry in each row or column.) For example, all Hamming codes with the same 
parameters are monomially equivalent. 


How many monomial equivalence classes of codes are there? We won't answer 
this question with a formula, but will translate it into projective geometry, revealing 
an unexpected but very important connection between these fields. 


(17.7.1) Theorem. There is a bijection between 
e monomial equivalence classes of linear 1-error-correcting codes of length n and 
dimension n — d over GF (gq); and 
© orbits of the general linear group GL(d, q) on n-element spanning subsets of the 
projective space PG(d — 1, 4). 


REMARK. Orbits of GL(d,¢) can be regarded as ‘geometric comfigurations’. For 
example, all conics in the projective plane PG(2, g) (see Section 9.7) form an orbit. 


Proor. We show that each set corresponds in a natural way to an equivalence 
class of matrices under a relation intermediate between row-equivalence and row- 
and-column-equivalence. To be precise, given a d x n matrix, we allow ourselves to 
apply arbitrary row operations, but restrict the allowable column operations to two 
types, viz., multiplication of a column by a non-zero scalar, and interchange of two 
columns. (These column operations obviously generate the group of all monomial 
transformations, while the row operations generate the whole general linear group 
of invertible linear transformations.) Now we consider equivalence classes (under 
this equivalence relation) of d x n matrices A such that 

(a) the rows of A are linearly independent; 

(b) any two columns of A are linearly independent. 


STEP 1. Given a linear code C of length n and dimension n — d with minimum 
weight at least 3, its check matrix A satisfies the two conditions (a) and (b) above; 
and C is determined as the null space of A, or equivalently as Cy, where Co is 
the row space of A. Now elementary row operations have the effect of changing 
the basis of the row space, and so don’t alter C; and monomial transformations of 
the columns replace C by a monomial equivalent code. So equivalence classes of 
matrices correspond to monomial equivalence classes of codes. 


STEP 2. Let S be a spanning set of n points in PG(d—1,¢), say $ = {p1,..-, Pn}. Let 
u; be a vector spanning the 1-dimensional subspace p;, and let A be the matrix with 
columns v], ..., v1. Obviously, any two columns of A are linearly independent. Also, 
since v;,..-,U, is a spanning set, some d-element subset is a basis; so the rows of A 
are linearly independent. Multiplying columns by non-zero scalars doesn’t change 
the points of projective space they span, and permuting columns merely affects the 
order in which the elements of S are written down. So monomial transformations 
of columns don’t affect S. On the other hand, elementary row operations generate 
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the general linear group GL(d,q), and so S can be transformed into any other set 
in the same orbit by a sequence of such operations. 

These two steps establish the required bijections, and prove the theorem. But it 
is much more than an enumeration result. Structural properties can be translated 


back and forth between code and geometry. For example: 
e Any word of weight 3 in C corresponds to a set of three dependent columns of 


A, and hence to a set of three collinear points of 5. ; 

e An element of the dual code C+ corresponds to a linear map from F? to F, 
whose kernel defines a hyperplane of the projective space. So the supports of 
non-zero elements of C} correspond to the complements of hyperplane sections 


of S. 


17.8. Exercises 


1. Write down a check matrix for the ternary Hamming code of length 13. Hence 
(a) decode the received word (1,2,1,0,2,1,0,0,1,0,2, 1,0); 

(b) construct a generator matrix for the code. 

2. Show that an orthogonal array of strength ¢ and index À over an alphabet of 
size q has cardinality À- qf, and is also an orthogonal array of strength i and index 
à- gt! for all i < t. 

3. Show that the design whose blocks are the supports of words of minimum weight 
in the q-ary Hamming code of length (q¢ — 1)/(¢ — 1) is isomorphic to the design 
whose blocks are the collinear triples of points in the projective space PG(d — 1,¢). 
4, Show that the group of isometries of the Hamming space H(n, q) is the wreath 
product S, wr Sp, in its product action (see Chapter 14). 


5. (COMPUTER PROJECT). Investigate solutions of the sphere-packing condition for 
the existence of a perfect code, viz. Dino (3) (g — 1Y divides q”. 


6. (a) Prove that, if C is a linear MDS code, then C+ is also MDS. 

(b) Show that the code C corresponding to a set S of points in PG(d - 1,9) 
(as in (17.7.1)) is MDS if and only if S has the property that no d of its points 
are contained in a hyperplane. (Such a set is called an arc.) Deduce that conics in 
PG(2,q) give rise to MDS codes. 


7. (a) Prove that the dual of a Hamming code of length (g¢—1)/(q—1) has minimum 
weight g¢! and attains the Plotkin bound. 

(b) Let A be a Hadamard matrix of order n (see Section 16.6). Normalise the first 
column to —1 and delete it; then change —1 to 0 throughout. Show that the code C 
whose words are the resulting rows attains the Plotkin bound. When is it linear? 


8, {a) Show that, for binary codes, the Hamming bound is always at least as strong 
as the Singleton bound. Hence or otherwise show that any MDS binary code is 
equivalent to a repetition code or the dual of one. 

(b) Prove that a q-ary perfect 1-error-correcting code has length at least g + 1. 


18. Graph colourings 


On the bank of the river he saw a tall tree: from roots to crown one half was 
aflame and the other green with leaves. 


‘Peredur son of Evrawg’ 
from The Mabinogion (earlier than 1325) 


Topics: Vertex and edge colourings; perfect graphs; graph minors; 
embeddings of graphs in surfaces 


TECHNIQUES: Use of Max-Flow Min-Cut Theorem for construc- 
tions; alternating chain arguments 


ALGORITHMS: 


CROSS-REFERENCES: Graphs, networks (Chapter 11); Hall’s theorem 
real 6); posets (Chapter 12); [symmetric functions (Chapter 
13 


In Chapter 11, we took the point of view that graphs model connectivity. Here, 
the viewpoint is that graphs model ‘incompatibility’. For example, suppose that 
radio frequencies are being allocated to a number of transmitters, Some pairs of 
transmitters are so close that their transmissions would interfere, and they must 
be allocated different frequencies. How many frequencies are required? A more 
classical example is the map colouring problem, where countries sharing a common 
frontier must be given different colours on a map; how many colours does the 
cartographer need?! 

We define a verter colouring of the graph I = (V, E) to be a function c from 
V to a set of colours such that, for any edge {z,y} € E, we have c(z) # c(y). 
In the frequency-assignment problem, the transmitters are the vertices, and the 
incompatible pairs edges, of a graph T; a legitimate frequency assignment is a 


1 The celebrated four-colour problem, asking whether four colours always suffice, was invented by 
Francis Guthrie, who communicated it (via his brother Frederick) to his mathematics professor at 
University College London, Augustus De Morgan, in 1852. Two common myths about its origin are: 
e It was known to cartographers for centuries. Unfortunately there is no evidence at all for this! 
e It was posed by Möbius in a lecture in 1840. The problem Mobius actually asked was whether 
there exists a map with five countries, any two sharing a frontier. Clearly such a map would 
require five colours; but its non-existence (which we prove in (18.6.3)} doesn’t guarantee that no 
other map needs five colours. 
For further information, see N. L, Biggs, E. K. Lloyd and R. J. Wilson, Graph Theory 1736-1936. 
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vertex-colouring of I. Similar remarks apply to the map-colouring problem. In 
each case, we are interested in the smallest number of colours for which a colouring 
exists. 

Note that the only graphs which have vertex colourings with a single colour are 
the null graphs. More generally, a clique in a graph T is a set of vertices, any pair 
joined by an edge (so that the induced subgraph? is complete), and a coclique is a 
set of vertices containing no edges (so that the induced subgraph is null). So, in a 
vertex colouring, each colour class is a coclique. 


18.1. More on bipartite graphs 
As defined in Chapter 11, a graph T = (V, E) is bipartite if there is a partition 
V = XUY, XNY = 9, so that every edge has one end in X and the other in Y. The 
partition of V is called a bipartition of the graph and its parts are bipartite blocks. 
A connected graph has a unique bipartition. 

Thus, in our new terminology, @ graph has a vertex colouring with two colours 
if and only if it is bipartite: the colour classes form a bipartition. 

The results in this section seem somewhat unconnected with colourings. Some 
connections will emerge in the rest of the chapter. 


(18.1.1) Proposition. If the largest coclique in a bipartite graph T = (V, E) has size 
m, then V can be partitioned into m subsets each of which is a vertex or an edge. 


Proor. Let {X,Y } be a bipartition of T. Let Y = {y1,-.-.4n} and, for? =1,...,7, 
let A; be the set of neighbours of y: (so that A; G X). We use a variant of Hall’s 
Marriage Theorem (see Chapter 6, Exercise 7): 

If a family (Aj,...,An) of subsets of X satisfies |A(J)| 2 |J\ -7 

for all J  {1,...,n}, then there is a subfamily of size n — r which 

has a SDR. 

Take any J C {1,...,n}. Then {yw : i € J} U(X \ A(J)) is a cochque; so 
\J|+|X|—|A(J)| £ m, or 
|A(J)| 2 I- (m — XI). 

So there is a subfamily of size d = n — (m — |X|) = |X|+IY|- m which has a SDR; 
that is, a set of this many disjoint edges of [. If we add in the remaining |X|- d 
uncovered vertices in X and |Y|—d uncovered vertices in Y, we obtain altogether 


d4|X|-d+|¥Y|-d=m 
disjoint vertices and/or edges whose union is V. 


A matching’ is a set of pairwise disjoint edges, and an edge-cover is a set of 
vertices meeting every edge. 


a 


2 Recall from Chapter 11 that an induced subgraph of l consists of a subset of its vertices, together 
with all edges contained within that set. 

3 Sometimes the term ‘clique’ is used in a more restrictive sense: it is required that no further vertex 
is joined to every vertex in the clique, that is, it is maximal with reapect to inclusion. (No outsider 
can be admitted to a clique.) 

4 Sometimes called a ‘partial matching’ to distinguish from a ‘complete’ or ‘perfect? matching which 
covers all vertices. 
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(18.1.2) Proposition. In a bipartite graph, the size of the largest matching is equal 
to the size of the smallest edge-cover. 


This is Kénig’s Theorem (11.10.2). 


(18.1.3) Proposition. Let T be a bipartite graph with maximum valency d. Then the 
edge set of ' can be partitioned into d partial matchings. 


Proor. Before beginning, we remark that the result is true in the case when F is 
regular of valency d. For (6.2.3) guarantees the existence of a perfect matching — 
equivalently, a SDR for the neighbour sets of the vertices in one bipartite block — 
and then an easy induction gives the result. 


The general proof is by induction on the number of edges. As usual, starting the 
induction is trivial. So assume that the theorem holds for graphs with fewer edges 
than T. Let (X,Y) be a bipartition of T, and e = {x,y} an edge of I’, with r € X 
and y € Y. Then the edges of T — e can be partitioned into d matchings. It is easier 
to visualise the edges as being coloured with d colours 1,2,...,d, so that a vertex 
lies on at most one edge of each colour. 

Since z and y have valency less than d in I — e, at least one colour does not 
occur on the edges at each of them. If the same colour is missing at both x and y, 
we can use it to colour the edge e. So we may suppose that colour 1 is missing at 
x, and colour 2 at y. 

Set x = un and define v, u2,v2,-.. by the rule that {u;,v;} has colour 2 and 
{v;,ui41} has colour 1, as long as such edges exist. Note that all vertices u; belong 
to X and all v; to Y. The sequence cannot revisit any vertex, so it must terminate; 
and, by assumption, it cannot terminate at either x or y (for example y # Vn since 
y lies on no edge of colour 2). Now we can interchange the colours 1 and 2 on the 
edges of this path without violating the condition that no two edges of the same 
colour meet at a vertex, As a result, colour 2 is no longer used on an edge through 
z, and we can give this colour to e. 


REMARK. The method of proof is called the alternating chains argument. 


A closely related result is the Gale-Ryser Theorem, which determines the possible 
valencies of bipartite graphs. 


(18.1.4) Gale-Ryser Theorem. Let zı >... 2 Im and yı > ... > Ym be positive 

integers. Then the following are equivalent: 

(a) there exists a bipartite graph for which the valencies in the two bipartite blocks 
are £1,...4Em and y;,...; nj 


(b) $z: = Soy; and 


t=1 j=l 
k n 
ar: <} min(k, yj) for k = 1,...,m. 


i=1 i=1 


Proor. The necessity of the conditions is straightforward. The first equation 
Zz; = Cy; counts in two ways the total number of edges in the graph. For the 
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second, consider the k vertices of largest valency in the first block. They lie on 
TE, z; edges. But the j vertex in the second block lies on at most min(k,y;) of 
these edges. 

For the sufficiency, we outline a proof using the Max-Flow Min-Cut Theorem. 
This demonstrates the use of this theorem in combinatorial constructions, of which 
there are many more examples. We construct a network with vertices s (source), 
dis... , Gm, Diye- 0n, t (target), and edges as follows: 

e (s, a;) with capacity z;, for i = 1,... m; 
è (b;,t) with capacity y; for j =1,...,7; 
o (ai, bj) with capacity 1, for? =1,...,m and j =1,...,n. 

The edges out of s and the edges into ¢ both form cuts with capacity Dizi Ti = 
Es vi = M, say. Suppose that S is any cut; say that S contains (s,4;,),.-., (8, ai, ) 
and (b;,,t),..-)(b;,¢). Then S$ must also contain (aibi) for i Æ t1,...,%% and 
j Æ Jis.. e, JG its capacity is 


k l 
D Tip + D vita +(m— k)(n— D), 
p=1 q=1 
and a little calculation {using the conditions (b) of the theorem) shows that this is 
at least M. 

So the minimum capacity of a cut is M. From the Max-Flow Min-Cut and 
Integrity Theorems (Section 11.9), we conclude that there is an integral flow of value 
M. In such a fiow, all edges (s, a;) and (b;, t) must carry their full capacity, since they 
lie in minimum cuts. Edges (a;, b;) carry flow 0 or 1. Let V = {a1,..-,@m1,-+-, bah 
and let E be the set of pairs {a;,5;} for which (a;,6;) carries flow 1. Since the flow 
out of a; is equal to z; this vertex has valency z; in the bipartite graph (V, E). 
Similarly, b; has valency y;. 


This proof is, in some sense, algorithmic, since the proof of the Max-Flow 
Min-Cut Theorem is constructive. Gale’s original paper gives a much more directly 
constructive proof. 


18.2. Vertex colourings 


The chromatic number of a graph [ = (V, E), written (I), is the least number r 
of colours such that T has a vertex colouring with r colours. Equivalently, it is the 
least r such that V can be partitioned into r cochques. The introduction to this 
chapter motivated the study of this invariant; but its computation is difficult. We 
note that, if T has a clique of size c, then all vertices of this clique must receive 
different colours in any vertex colouring; so the chromatic number is at least c. But 
this invariant is also hard to calculate! An upper bound for x(T), easier to compute, 
is the maximum valency of I’. (This is the content of Brooks’ Theorem, to be proved 
in the next section.) 

One formal approach to the chromatic number is via the chromatic polynomial, 
which is the function fr on the natural numbers defined by 


number of colourings of T with the set 


fr(r) = { {1,...,r} of colours. 
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Because of the name, you will not be surprised to learn that f(r) is a polynomial 
in r, though this is not obvious; it will emerge from a recursive calculation of this 
number. Note that x(I°) is the least r for which fr(r) > 0. 


EXAMPLE. If K, and N, are the complete and null graphs on n vertices, then the 
colours used in a colouring of K,, must all be distinct, while those used for N, are 
unrestricted. By (3.7.1), 


frar) = (r) = rir- 1)...(r-n +1), 
fule) =P". 


Let e = {z, y} be an edge of T = (V, E). We define two operations on I’: 

o Deletion of e yields the graph [ — e = (V, E \ {e}). 

o Contraction of e: replace x and y by a new vertex z, with an edge {v, z} whenever 
{v,2} € E or {v,y} € E; edges not containing x or y are unaltered. Call the 
resulting graph T /e. 


(18.2.1) Theorem. fr(r) = fr-.(r) — fryer). 


Proor. We divide the set of colourings of I — e into two disjoint classes: 
e Those for which z and y receive different colours. These are the valid colourings 
offr. 
e Those for which z and y receive the same colour. Such a coluring induces a 
colouring of I'/e, and conversely. 
So fr_.(r) = fr(r) + frye(r), as required. 


Now, if T is given and e is an edge of T, then  — e has fewer edges, and T'/e 
fewer vertices, than I. Assuming inductively that their chromatic polynomials are 
known, that of I can be calculated. The induction begins with graphs without edges, 
for which we calculated the number of colourings already. This inductive argument 
also proves that the chromatic polynomial of I is a polynomial in r of degree n, 
where n is the number of vertices of F. 


18.3. Project: Brooks’ Theorem 


Brooks’ Theorem asserts that, with known exceptions, a connected graph with 
maximum valency d has a vertex colouring with d colours, 


Firat note that a graph has a vertex colouring with a given number of colours if and only if 
all its connected components do; so it is enough to consider connected graphs. Also, the fact that 
a graph with maximum valency d can be coloured with d+ 1 colours is straightforward to prove. 
Consider the vertices one at a time. Each vertex v has at most d neighbours, to which at most d 
colours have been applied; so there is an unused colour available for v. 

We cannot expect to reduce d+ 1 to d here without paying a price. The complete graph on d+ 1 
vertices has valency d but obviously requires d + 1 colours. Also, a circuit of odd length is divalent 
but not bipartite (that is, not 2-colourable). Brooks’ Theorem asserts that these graphs are the only 
(connected) exceptions. 

The proof of Brooks’ Theorem repeats the above argument with more care, in the ‘general’ case 
(Case 1 in the argument below), ensuring that some colour does not appear among the neighbours 
of a vertex when we come to colour it. The other two cases are more in the nature of minor irritants. 


296 18. Graph colourings 


First, a piece of terminology. Let k be a positive integer. A graph T ia said to be k-connected if, 
for any k — 1 vertices v1,..., 4-1 of I, the graph I — vi — -.. — g-1 obtained by deleting them is 
connected, 


(18.3.1) Brooks’ Theorem 
A connected graph with maximum valency d, which is neither a complete graph 
nor a cycle of odd length, has a vertex colouring with d colours. 


Prooz. The proof is by induction; we assume that the theorem is true for all graphs with fewer 
vertices than T (and, in particular, for all proper induced subgraphs of l). Assume that T is neither 
complete nor an odd cycle. We divide the proof into three cases. 


Case 1. T is 3-eonnected. Since it is not complete, there are two vertices u, w of I at distance 2. Let 
v be a common neighbour of u and w. Let v, = u, v2 = w. Now T — u — w is connected. We define a 
partial order on its vertices by the rule that z < y if y lies on a shortest path from v to 2. Note that 
v is the unique maximal element in this order. Take a linear extension of the partial order (12.2.1), 
say va < tq <... < Un. We have vn = v. Moreover, by construction, for any i with 3 <i < n, there 
exists j > 7 such that v; is joined to vj. 

Take d colours 1,2,...,d. Now give colour 1 to vı and v2 (this is legitimate since they are not 
joined). Colour the remaining vertices in turn, For 3 < i < n, at most d—1 neighbours of w are 
already coloured (since it has a neighbour later in the sequence), so there is a colour available for 2. 
Finally, when we reach vn, all its neighbours are already coloured, but two of them (v; and v2) have 
the same colour; so there is a colour available for vn. 


Casz 2. I is not 2-connected. Thus there is a vertex v whose removal disconnects I; and the vertices 
different from v can be partitioned into non-empty subsets X and Y such that no edge goes from X 
to Y. By induction, one of two possibilities holds: 

(a) Each of the induced subgraphs on X U {v} and Y U {v} can be coloured with d colours, We 
can change the names of the colours so that v has the same colour in each colouring, and we 
have a colouring of T. 

(b} One of X U {v} and Y U {v}, say X U {v}, carries either a complete graph Kz41, or a cycle of 
odd length (with d = 2). But this is impossible, since v would have d neighbours in X and at 
least one in Y. 


Cass 3. T is 2-connected but not 3-connected. In this case there are two vertices u,v whose removal 
disconnects I’, say into X and Y as above. The argument of Case 2 applies except in one situation: 
(c} Each of X U {u,v} and Y U {u,v} requires d colours. Moreover, in any colouring of X U {u,v} 
with d colours, u and v have the same colour; and in any colouring of Y U {u, v} with d colours, 
u and v have different colours. (In particular, u and v are not joined.) 
Let 2, and 2, be the valencies of u and v in X U {u,v}, and yu, ye their valencies in Y U {u,v}. 
Now zy + ty < d and zy + yy < d. Also, at least d — 2, colours are available for u in X U {u,v}, 
and similarly for v and for Y U {u,v}. Now the colours of u and v in X U {u,v} must be uniquely 
determined, or we could change one of them and violate (c); so 2u = yu = d — 1. Similarly, the sets 
of colours available for u and v in Y U {u,v} must be disjoint; so (d — yu) + (d — £u) < d, whence 
Yu + ¥y > d. Thus 
2(d— 1) +d < 2d, 


or d = 2. But then [I is either an odd cycle or bipartite, and the theorem is proved. 


18.4. Perfect graphs 


We've seen that, for any graph T, the chromatic number x({T) (the smallest number 
of cocliques into which T can be partitioned) is not less than the clique number y(T) 
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(the size of the largest clique in T). Obviously, graphs in which equality holds are 
interesting. Claude Berge realised that, to obtain a manageable theory, we should 
require this condition also for all induced subgraphs of T. Thus, a graph F is perfect 
if, for every induced subgraph A of I’, the chromatic number and clique number of 
A are equal. 

We'll also look at complements. (The complement T of T has the same vertex set 
as I’; two distinct vertices are joined in T if and only if they are not joined in T.) 
The cliques of T are the cocliques of T, and vice versa. So the clique number of T is 
equal to the coclique number of T (the size of its largest coclique), and the chromatic 
number of T is the clique-partition number of T (the smallest number of cliques into 
which it can be partitioned). 


A number of earlier results can be phrased to say that certain graphs are perfect. 
Note that, if a class of graphs is closed under taking induced subgraphs, then to 
prove that every graph in the class is perfect, we have the seemingly easier task 
of proving that every graph in the class has chromatic number and clique number 
equal. 


(18.4.1) Proposition. (a) Bipartite graphs are perfect. 
(b) Complements of bipartite graphs are perfect. 


PROOF. Both classes are induced-subgraph-closed, so we show that either type has 
clique number and chromatic number equal. For bipartite graphs, this is trivial: 
both numbers are 2 unless the graph is null {in which case they are 1). For (b), this 
is the content of (18.1.1). 


The line graph of a graph [ = (V, E) is defined as follows. The vertex set of 
L(T) is E, the edge set of E; two vertices ¢1,e2 are joined in L(T) if and only if (as 
edges of I’) they have a common vertex. There are two kinds of cliques in LT): 

(a) a set of edges of T' through a common vertex; l 
(b) the edge set of a triangle (3-cycle). 

Case (b) cannot occur in a bipartite graph T. So the clique number of L(I) is the 
maximum valency of I’. Similarly, the coclique number of (T) is the size of the 
largest partial matching in T. Thus, (18.1.2) and (18.1.3) can be phrased as follows: 


(18.4.2) Proposition. (a) Line graphs of bipartite graphs are perfect. 
(b) Complements of line graphs of bipartite graphs are perfect. 


Another two classes of perfect graphs arise from posets (Chapter 12). Let 
P = (X, <) bea poset. Two distinct points 2, y E€ X are comparableifa < yory < z 
and incomparable otherwise. The comparability graph and incomparability graph of P 
are the graphs with vertex set X whose edges are the comparable and incomparable 
pairs of vertices respectively. (Of course, these graphs are complementary.) Now 
Dilworth’s Theorem (12.5.3) and its (much easier) dual (12.5.2) translate as follows. 


(18.4.3) Proposition. (2) Comparability graphs of posets are perfect. 
(b) Incomparability graphs of posets are perfect. 
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In view of these results (and others: see, for example, Exercise 2), the next 
theorem is no surprise. It was conjectured by Berge (under the name ‘weak perfect 
graph conjecture’), and proved by Lovasz. I will state it here without proof. 


(18.4.4) Perfect Graph Theorem. The complement of a perfect graph is perfect. 


To get some idea of the power concealed in this harmless-looking theorem, note 
that using it we may deduce Dilworth’s Theorem from its ‘trivial’ dual, and similarly 
Hall’s Theorem from an even more trivial result. So we can’t expect to find a 
three-line proof of the Perfect Graph Theorem. 


I conclude with the main open problem on perfect graphs, which was also 
conjectured by Berge. It is clear that, if n is odd and n > 3, then the n-cycle Cù is 
not perfect: it has chque number 2 and chromatic number 3. It is also not difficult 
to show that the complement of Cn fails to be perfect. Thus a graph which contains 
either C,, or Cn as an induced subgraph for n odd and n > 3 also fails to be perfect. 
We call such induced subgraphs odd holes and odd antiholes, Berge conjectured that 
these are the only obstructions to perfection: 


(18.4.5) Strong Perfect Graph Conjecture. A graph is perfect if and only if it contains 
no odd hole or odd antihole. 


If true, this would imply the Perfect Graph Theorem, since the class of graphs 
which satisfy the conclusion of the conjecture (which are nowadays called Berge 
graphs) is obviously closed under complementation. But the conjecture has so far 
defeated a small army of graph theorists! 


18.5. Edge colourings 


An edge colouring of a graph P = (V, E) is a map c from E to a set of colours with 
the property that two edges sharing a vertex have different colours. 

Using the notion of line graph defined in the preceding section, we see that an 
edge colouring of a graph is exactly the same thing as a vertex colouring of its line 
graph. So, in a sense, the theory of edge colourings is a part of the theory of vertex 
colourings; but it has its own particular style and results. The chromatic index of T 
is defined to be the least number of colours required for an edge colouring of [. 


In an edge colouring, all the edges which meet at a vertex must have different 


colours. So the chromatic index of T cannot be smaller than its maximum valency. 
The following theorem of Vizing restricts the chromatic index to two possible values: 


(18.5.1) Vizing’s Theorem. If a graph has maximum valency d, then it has an edge 
colouring with d + 1 colours. 


So the chromatic index is either d or d + 1. Accordingly, the class of all graphs 
can be divided into two parts. A graph T belongs to Class 1 if its chromatic index 
is equal to its maximal valency, and to Class 2 otherwise. According to (18.1.3), all 
bipartite graphs belong to Class 1. 
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We met edge colourings of complete graphs, in rather different language, in 
Section 8.6: 


An edge colouring of the complete graph K, with the smallest 
possible number of colours is the same thing as a tournament 
schedule for n teams. 


(The teams are the vertices of the graph, the rounds of the tournament are the 
colours of the edges.) In particular, the chromatic index of K, is n if n is odd, n — 1 
if n is even, In other words: 


(18.5.2) Proposition. The complete graph K,, belongs to Class 1 if n is even, and to 
Class 2 if n is odd. 


18.6. Topological graph theory 


Although graphs are abstract objects, it is safe to assume that most people think of 
them as ‘dots and lines’, the way we’ve drawn them many times already. In other 
words, we choose some familiar geometric or topological space as a drawing board. 
and represent the vertices by distinct points of the space; each edge is represented 
by a line or curve whose endpoints correspond to its vertices. 

The question ‘What is a curve?’ is a difficult one which took mathematicians 
nearly a century to resolve. Peano, for example, constructed a continuous curve 
passing through every point of the unit square. But such curves don’t aid intuition. 
We assume that an edge is represented by a continuous, piecewise smooth curve (one 
having a continuously varying tangent everywhere except perhaps a finite number 
of ‘corners’). 

For applications such as road layouts and map colouring, we impose a further 
condition: 


The curves representing two edges are disjoint except for the point 
representing their common vertex {if any). 


We call a drawing of I satisfying this condition an embedding of T in the space. 
It isn’t clear whether a given graph can be embedded in a given space. In three 
dimensions, there is no restriction: 


(18.6.1) Proposition. Any graph can be embedded in R*, 


Proor. Take a line L, and represent the vertices by points of L. For each edge e, 
take a plane Il, through L (all these planes distinct), and join the vertices of e by a 
semicircle in Ile. 

In two dimensions, the situation is very different. We call a graph planar if it 
is embeddable in the Euclidean plane. Some experimentation should convince you 
that the complete graph Ks and the complete bipartite graph K3,3 are non-planar. 
We'll see that this is a consequence of a theorem of Euler. 

First note that embedding in the plane and in the surface of the sphere are 
‘equivalent’ concepts. This is because of stereographic projection, which establishes 
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E 


Fig. 18.1. Stereographic projection 


a bijection, smooth in both directions, between the plane and the sphere with its 
north pole removed (Fig. 18.1). Using this bijection, an embedding in the plane 
is transferred to the sphere. Conversely, given an embedding in the sphere, choose 
a point lying on none of the curves and use it as the north pole; then projection 
transfers the embedding to the plane. 

Now let T be a connected graph. Given an embedding ofT in the plane or sphere, 
the removal of the image of the embedding leaves a finite number of connected 
pieces called faces. In the case of the plane, just one face — the infinite face — is 
unbounded. The boundary of a face is a closed curve made up of a finite number of 
vertices and the same number of edges (possibly with repetitions), corresponding to 
a closed trail in T. (The connectedness of the face boundary depends on that of T: 
can you see why?) The face itself is topologically equivalent to a disc (the interior 
of a circle), except in the case of the infinite face in the plane. 


(18.6.2) Euler’s Theorem. Let an embedding of the connected graph T in the plane 
have V vertices, E edges and F faces. Then 


V-F4F=2. 


Proof, We use induction on E., A connected graph with one vertex and no edge has 
one face, and satisfies the theorem. 

Suppose that there is an edge e such that [ — e is connected. Then T — e has 
V vertices, E — 1 edges and F — 1 faces, since, when e is removed, the two faces 
on either side of it coalesce. (We have to show that these two faces are different. 
Suppose not; let f be this face. There is a curve in f from one side of e to the other. 
When e is removed, this becomes a simple closed curve in f. By the Jordan Curve 
Theorem,’ this curve divides the plane or sphere into two components, each of which 
contains a vertex of e; so I — e is not connected.) So V —-(H-—1)+(F-1) =2, 
and we are done. 


° The Jordan Curve Theorem asserts that a simple (non-intersecting) closed plane curve has an 
‘inside’ and an ‘outside’; that is, its complement has two connected components, just one of which is 
unbounded. 
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So we may assume that there is no such edge e. Then T is a tree. (Choose a 
spanning tree T of T, by (11.2.2). If T # T, then the removal of an edge outside 
T leaves a connected graph.) Thus, Æ = V — 1, by (11.2.1). Moreover, F = 1. So 
V — E + F =2, as required. 

If you find that proof a bit unsatisfactory in its (unspoken) appeals to geometrical 
or physical intuition, you should read Imre Lakatos’ Proofs and Refutations (1976). 
Euler’s Theorem is used as a test case for an investigation of mathematical rigour, and 
plausible ‘counterexamples’ are used to refine and make precise both the statement 
of the theorem and the arguments used in the proof. (If you are happy with the 
above proof, then there is even more reason for you to read the book!) 


(18.6.3) Corollary. K; and K3, are non-planar. 


Proor. (a) Ks has 5 vertices and 10 edges, so an embedding would have 7 faces. But 
each face has at least three edges (a face with 1 or 2 edges can only occur if there 
are loops or parallel edges in the graph), while each edge bounds at most two faces. 
Double-counting incident edge-face pairs shows that the number of faces is at most 
10 - 2/3 = 62, a contradiction. 

(b) Ka, has 6 vertices and 9 edges, so 5 faces (if embedded in the plane). Now 
each face has at least four edges: for the graph is bipartite and has no closed trail 
of odd length. The same argument as before then shows that there are at most 4} 
faces, a contradiction. 


From this result, we can give further examples of non-planar graphs, A subdi- 
vision of a graph T is obtained by repeated application of the operation ‘insert a 
vertex into an edge’: replace the edge {z,y} by two edges {z,v} and {v, y}, where 
v is a new vertex. It’s clear that embeddability in any space is unaffected: choose 
any point on the path from z to y to represent v, and let the two ‘halves’ of this 
path represent {z,v} and {v,y}. So any subdivision of Ks or K3,3 is non-planar, as 
is any graph containing a subgraph of this form. A still more general construction 
involves minors of a graph. 

A graph To is said to be a minor ofT if it can be obtained from I by a series of 
deletions and contractions. (See Section 18.2, where it was shown that the chromatic 
number of T is determined by its proper minors.) Note that a graph can be obtained 
from any subdivision by contraction; so, if a subdivision of To is a subgraph of T, 
then To is a minor if F. 

The class of planar graphs is closed under taking minors. (It is clear that 
deleting an edge from a planar graph gives a planar graph. Contraction is a little 
less obvious. Imagine a continuous deformation in which the curve representing the 
edge shrinks to a point.) So a planar graph has no Ks or K, minor. Remarkably, 
the converse is true: 


(18.6.4) Kuratowski-Wagner Theorem. The following conditions on a graph [ are 
equivalent: 

fa) T is planar; 

(b) T contains no subdivision of Ks or K3,3; 

(c) T has no minor isomorphic to Ks or K3, 
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On the basis of this and other evidence, it was conjectured by Wagner that any 
minor-closed class of graphs is determined by a finite set of ‘forbidden minors’. This 
has been proved recently in a major piece of work by Robertson and Seymour: 


(18.6.5) Robertson-Seymour Theorem. Let C be a class of graphs which is closed 
under taking minors. Then there is a finite set S of graphs with the property that 
T €€ if and only if no member of S is a minor of T. 


What about other 2-dimensional surfaces? Topologists have produced a complete 
classification of closed surfaces (without boundary points or infinite points), which 
we now outline. First, such surfaces are divided into orientable and non-orientable 
surfaces. A surface is non-ortentable if it is possible to take a clock on a trip ‘round 
the world’ and find, on returning, that its hands turn backwards (its orientation has 
been reversed). The most famous example is the Mébtus strip, obtained by taking a 
strip of paper, giving one end a 180° twist, and joining the ends. It is not closed; 
but by stitching up the boundary in either of two possible ways, we obtain the Klein 
botile and the real projective plane, both closed and non-orientable. A surface is 
orientable if this phenomenon cannot occur. The sphere is an example. Another is 
the torus, obtained by forming a cylinder (by joining the ends of a strip without 
a twist) and then bending it round and sewing up the ends without a twist. The 
classification asserts: 


(18.6.6) Classification of closed surfaces 
(a} An orientable closed surface is homeomorphic to a ‘sphere with 
g handles’, for some g > 0. 
(b) A non-orientable closed surface is homeomorphic to a ‘sphere 
with c cross-caps’, for some c > 0. 


A handle is just like the handle of a teacup, so that a sphere with one handle is 
a torus. Another metaphor is a bridge. (If a graph drawn in the plane has two 
edges which cross, then the crossing can be removed by replacing the level-crossing 
by a bridge. So the class of graphs embeddable on the torus is larger than for the 
sphere.) A cross-cap is more mysterious; it is like a black hole such that, if you enter 
the event horizon at one point, you instantly find yourself leaving at the opposite 
point with your orientation reversed.’ (This also gives a mechanism for resolving 
crossings.) 

An embedding of a graph in a surface is called simple if each face is homeo- 
morphic to a disc. Not all embeddings are simple. For example, take a graph in the 


6 tA topologist is someone who can't distinguish his doughnut from his teacup. 


7 It is instructive at this point to compare the topologist’s ‘real projective plane’ with the geometer’s 
(Chapter 9). To a geometer, the points are the lines through the origin in IR°. Bach affine point 
(Le., line not in the equatorial plane) can be represented by the unique point where it meets the 
southern hemisphere of the unit sphere; points at infinity correspond to antipodal pairs of points on 
the equator. This can be realised by taking a cross-cap covering the entire northern hemisphere. 
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plane; draw it inside a small disc, and paste the disc onto a torus. Now the ‘infinite 
face’ is not a disc (in topologists’ language, it is not simply-connected). For simple 
embeddings, there is a generalisation of Euler’s Theorem: 


(18.6.7) Euler’s Theorem for surfaces. Suppose that a simple embedding of a graph 
in a surface S has V vertices, E edges and F faces. 

(a) If S is a sphere with g handles, then V — E + F = 2 — 2g. 

(b) If S is a sphere with c cross-caps, then V -E + F =2-c. 


The number on the right-hand side of each of these equations is called the Euler 
characteristic of the relevant surface. Euler's Theorem gives restrictions on graphs 
embeddable in a surface, by the same argument as in (18.6.3). Sometimes, exact 
bounds can be obtained. 


(18.6.8) Ringel-Youngs Theorem. K,, can be embedded in a sphere with g handles 


if and only if © 
n < 1 (7+ y8 FT). 


PROOF. K, has n vertices, n(n —1)/2 edges, and so at most n(n — 1)/3 faces (arguing 
as before). So 
n—n(n—1)/2+n(n—-1)/3 > 2 — 2g. 


Rearranging as a quadratic in n, we find the inequality of the theorem. Now it is 
necessary to construct a complete graph with |}(7+ v48g +1)| vertices, embedded 
in a sphere with g handles. This is the content of a long project by Ringel and 
Youngs. For g = 0,1, the formula gives 4 and 7 respectively. Exercise 9 asks you to 
show that K7 is embeddable in a torus. 


In general, the class of graphs embeddable in a surface S$ is minor-closed. 
According to the Robertson-Seymour Theorem (18.6.5), it is characterised by a 
finite set of excluded minors. But this set can be quite large. For example, 35 
excluded minors are required to characterise graphs embeddable in the projective 
plane, and over 800 for the torus! 


One of the main areas of interest in topological graph theory is the connection 
with colouring problems. Any plane map can be described by a graph whose 
vertices are the countries, with edges joining countries which share a boundary. (If 
two countries share several unconnected segments of boundary, use multiple edges.) 
Now a colouring of the map is the same thing as a vertex colouring of the graph. 
The famous four-colour problem was resolved in 1976 by Appel and Haken, with the 
help of extensive computation: 


(18.6.9) Appel-Haken Theorem, or Four-colour Theorem. Any planar graph has a 
vertex colouring with four colours. 


It is impossible to summarise here the techniques used; but Appel and Haken, 
and others, have written several good accounts. On the other hand, we prove in the 
next section that five colours suffice; the proof illustrates the basic ideas which grew 
into the Appel-Haken proof. 
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In this case, it is trivial that there are maps which require four colours, but very 
difficult to show that no more than four are needed. For other orientable surfaces, 
the difficulty is the other way around. It is fairly straightforward to give an upper 
bound for the number of colours needed. This bound turns out to be precisely the 
number in the Ringel-Youngs Theorem! This theorem guarantees that a complete 
gtaph of the appropriate size is embeddable in the surface, and it requires as many 
colours as it has vertices. We conclude: 


(18.6.10) Map Colouring Theorem. The minimum number of colours required for a 
vertex colouring of any graph embeddable in the sphere with g handles is |}(7 + 


4/489 + 1}. 
18.7. Project: The Five-colour Theorem 


In this section, I show that a planar graph can be coloured with five colours. The 
argument is due to Kempe, who thought (incorrectly) that he had proved the Four- 
colour Conjecture. The mistake was pointed out by Heawood, who salvaged the 
Five-colour Theorem (and more). See J. J. Watkins and R. J. Wilson, Graphs: An 
Introductory Approach (1990), for further discussion. 


(18.7.1) Five-Colonr Theorem. A planar graph has a vertex colouring with five colours. 


Proor. The proof is by induction on the number of vertices. We assume the result for graphs with 
fewer vertices than T. We also assume that [ is drawn in the plane, and that T has no repeated edges 
(since these don't affect the chromatic number). 

Let T have V vertices, of which n; have valency ¢ for each i; let there be E edges and F faces. 
Now, as in the proof of (18.6.3), we have 2E > 3F. Counting vertices and incident vertex-edge pairs, 


Son = V, 
J in; = 2E. 


From Euler’s Theorem, we conclude that 


S26 - i)n; > 12. 


The left-hand side of this inequality must be positive; so n; > 0 for some i < 6, whence T contains 
a vertex of valency at most 5. 

Let v be such a vertex. By induction, [ — v has a colouring with five colours 1, 2, 3, 4, 5. If not 
all colours are used on the neighbours of v, then there is a free colour which can be applied to v. 
So we may assume that v has valency 5, and that all its neighbours have different colours. Let the 
neighbours be 21,..., 25 in anticlockwise order, where we may assume that z; has colour i. 

Let S be the set of all vertices which can be reached from z; by a path using vertices with 
colours 1 and 3 only. Then we can legitimately interchange colours 1 and 3 throughout the set S, 
without affecting the property that adjacent vertices have different colours. If z3 ¢ S, then after this 
interchange no neighbour of v has colour 1, and we can use this colour for v. So we may suppose 
that z3 E€ S. Thus, there is a path z1, £1,..., £k, Z3 consisting of vertices with colours 1 and 3. 
Adjoining v to this path, we obtain a simple closed curve C. 

By the Jordan Curve Theorem, C divides the plane into two parts, and clearly 22 and z4 lie in 
different parts; suppose that zz is inside C. Let T be the set of vertices which can be reached from 
z2 by a path using vertices with colours 2 and 4 only. No such path can cross C, so T lies wholly 
inside Ç, and z4 ¢ T. Then we can interchange the colours 2 and 4 throughout T, freeing colour 2 


for use on v. 


18.8. Exercises 


18.8. Exercises 


1. Find the clique number and the chromatic number of (a) the complement of the 
n-cycle Ca; (b) the Petersen graph. 


2. Find the chromatic polynomial of the path P, and of the cycle Cn with n vertices. 
3. (a) Let zı >... > tm and yı È ... > yn be positive integers. Show that the 


following are equivalent: 
e there is a bipartite graph with valencies 21,...,%, in one bipartite block and 

Yis- Yn in the other; 

e there is a matrix with entries 0 and 1 only, having row sums 7,...,2m and 
column sums y),...,Yn- 

(b) Recall the notion of partition of an integer, conjugate partitions, and natural 
partial order of partitions from Section 13.1. Use the Gale-Ryser Theorem (18.1.4) 
to show that, if à and y are partitions of the same integer, then there is a zero-one 
matrix whose row sums are the parts of À and whose column sums are the parts of 
je if and only if p < A*. 

REMARK. In fact, with the notation of Section 13.6, if we express the elementary 
symmetric polynomial e, in terms of the basic polynomials m,, by 

ey = L Arp, 

phn 

then a), is equal to the number of zero-one matrices whose row sums form the 
partition À and whose column sums form the partition z. (This is the content of 
Exercise 10 of Chapter 13.) We showed in the proof of Newton’s Theorem (13.5.1) 
that a), = 0 unless p < à"; the Gale-Ryser Theorem asserts the converse, viz., if 
A < p* then ay, > 0. 
4. A graph [ is called an interval graph if its vertices are a collection of non-empty 
intervals of the real line R, with two vertices adjacent if and only if they have 
non-empty intersection. Interval graphs are useful in modelling time-dependent 
phenomena. By slight perturbations of the endpoints of the intervals, we may 
assume that these endpoints are all distinct, and that the intervals are closed. 

(a) Prove that interval graphs are perfect. [HINT: Given a set of intervals, let 
n(z) be the number of intervals containing the real number z. The clique number 
is the maximum value of this function. Take the smallest + at which the maximum 
is attained; then some interval in the collection starts at z. Repeat at the smallest 
z not yet covered at which the maximum is attained, as long as one exists. In this 
way, we construct a coclique covering every z at which n(x) is maximum. Now use 
induction.] 

(b) Prove that complements of interval graphs are perfect. [HINT: Let Ci,...,Cm 
be the cocliques of maximum size in an interval graph. Let z; be the right-hand end 
of the leftmost interval in Ci, and let z be the minimum of 2,...,%m. Show that 
the leftmost interval of each C; contains z.] 


5. Let g be a permutation of {1,...,n}. The permutation graph defined by g has 
vertex set {1,...,7}; its edges are all the pairs whose order is reversed by g (that is, 
all {i,j} with ¿ < j and ig > jg). 
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(a) Prove that the complement of a permutation graph is a permutation graph. 

(b) Recall the dimension of a poset (Section 12.6). Prove that a graph is a 
permutation graph if and only if it is the incomparability graph of a poset of 
dimension at most 2, 

(c) Prove that a graph is a permutation graph if and only if it is both a 
comparability graph and an incomparability graph. 

(d) Prove that a permutation graph is perfect. Can you find a direct argument? 
[HINT: Use the argument of Erdős and Szekeres used to prove (10.5.1).] 


6. A graph is called N-free if it doesn’t contain the path on 4 vertices as an induced 
subgraph. (See the Hasse diagram of the poset N in Fig. 12.1.) 

(a) Show that the complement of an N-free graph is N-free. 

(b) Show that an N-free graph is connected if and only if its complement is 
disconnected. 

{c) Prove that the class of N-free graphs is the smallest class containing the 
1-vertex graph and closed under disjoint union and complementation. (In other 
words, any N-free graph can be built from 1-vertex graphs by these operations.) 

(d) Hence show that N-free graphs are perfect. 


7. Show that an N-free graph is a comparability graph. [Hint: Exercise 6(c).] Hence 
show that an N-free graph is a permutation graph. 


8. Show that the Petersen graph belongs to Class 2. 


9. Find an embedding of Ky in a torus, and an embedding of the Petersen graph in 
the real projective plane. 


10. Show that any finite graph can be embedded in R? so that edges are represented 
by straight line segments. [HINT: Consider points (t, t?, t°).] 


11. Show that a plane triangulation with no vertex of valency less than 5 has at least 
12 vertices of valency 5. Construct an example with exactly 12 vertices of valency 5, 
and colour it with four colours. 


19. The infinite 


In the Middle Ages the problem of infinity was of interest mainly in connection 
with arguments about whether the set of angels who could sit on the head of 
a pin was infinite or not. 


N. Ya. Vilenkin, Stories about Sets (1965) 


. the true mathematician and physicist know very well that the realms of 
the small and the great often obey quite different rules. 


Kurt Singer, Mirror, Sword and Jewel (1973) 


Topics: Set theory, cardinal and ordinal numbers; Konig’s Infinity 
Lemma, Zorn’s Lemma and equivalents; infinite Ramsey Theorem; 
the ‘random graph’ 

TECHNIQUES: Transfinite induction; free constructions; back-and- 
forth; probabilistic existence proofs 


ALGORITHMS: 
CROSS-REFERENCES: SDRs (Chapter 6); projective planes (Chap- 


ters 7, 9); Steiner triple systems (Chapter 8); posets (Chapter 12); 
graph colourings (Chapter 18) 


Counting is a less precise tool for infinite sets than for finite ones. The shepherdess 
who can count her flock of a hundred sheep will know if the wolf has taken one; 
but, if she has an infinite flock, she won't notice until almost all of her sheep have 
been lost. 

Nevertheless, combinatorics depends on counting. So, in the first section, you 
will find a quick tour through set theory and the two kinds of numbers used for 
infinite counting. 

The remainder of the chapter describes some topics in infinite combinatorics. 
Most of these could be described as ‘climbing up from the finite’; truly infinite 
reasoning is more recondite and is done mostly by set theorists. 


19.1. Counting infinite sets 


This section gives a very brief account of set theory and cardinal and ordinal num- 
bers. It is no substitute for a textbook account (such as K. J. Devlin’s Fundamentals 


of Contemporary Set Theory), however. 
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Before beginning infinite combinatorics, we must look at how to count infinite 
sets, We will see that two kinds of counting (progressing in order from one number 
to the next, and measuring the ‘size’ of a finite set), which are essentially the same 
in the finite case, have to be distinguished. First, though, there are two more 
fundamental difficulties: What is an infinite set? And, anyway, what is a set? 

In Chapter 2, I took the point of view that we understand the natural numbers 
from our early experience with counting. In much the same way, we have intuition 
about sets (or ‘collections’, ‘classes’, or ‘ensembles’ of objects) against which to test our 
conclusions. However, Russell’s paradox demonstrates that we cannot uncritically 
allow any collection of elements to form a set, or we introduce contradictions into 
the foundations of mathematics.' 

The basic idea adopted to rectify this problem is that we start with some 
collection of fundamental objects or ‘urelemente’ which are not themselves sets, 
and then construct sets in stages: at each stage, we can gather together objects 
constructed at earlier stages into sets.? Logicians prefer to build the mathematical 
universe out of nothing, and traditionally start with the empty set of objects. It 
is not sufficient just to go through stages 1, 2 and so on (indexed by the natural 
numbers), since the sets we would obtain would all be finite. (Beginning with 0, at 
the first stage, we get {0}; at the second, {{@}} and {0, {@}}; etc.) We must continue 
the construction into the transfinite, and need infinite sets to describe the stages 
properly. To avoid circularity, mathematicians adopted an axiomatic approach. 

However, logicians know well that axioms? can never entirely capture a math- 
ematical structure. Kurt Gadel showed that, if a structure has at least the richness 
of the natural numbers (with their ordering, addition, and multiplication), then any 
set of axioms which can be written down (actually or potentially) is ‘complete’: 
some assertions about the structure can be neither proved nor disproved using these 
axioms. Subsequent work showed that no infinite structure can be completely spec- 
ified by axioms; there will always be other structures satisfying the same axioms. 
So, if we decide to base set theory on axioms, we must be prepared for there to be 
different ‘set theories’, and statements which are true in some and false in others. 

It is worth sparing a moment to see why the ambiguities come in. In terms of our 
intuition, the gathering of elements into sets in each stage is not precisely defined, 
and there is room for manœuvre on what subsets are included. Now everything 
has a set-theoretic description. An ordered pair is a set (the standard definition is 
(z,y) = {{z}, {z, y}}*); a function is a set of ordered pairs. In particular: 

e We want to say that two sets have the same number of elements if there is 


1 As Bertrand Russell wrote to Gottlob Frege, ‘Consider the set of all sets which are not members 
of themselves. Is it a member of itself?’ 

2 This procedure avoids Russell’s paradox: the elements of Russell’s ‘set’ continue appearing at every 
stage in the construction, so there is no stage at which they all exist to be gathered into a set at the 
next stage. 

3 The logical system used for the discussion here is ‘first-order logic’, widely accepted as the best 
logical basis for mathematics. 

4 The important feature of this definition is that (z, y) = (u,v) if and only if z = u and y = v. Any 
set-theoretic construct with this property would do. 
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a bijection between them. But different set theories have different bijections; 
so two sets may have the same number of elements in one theory and not in 
another. 

The notorious Aziom of Choice asserts that given any ‘family’ (that is, set) of 

non-empty sets, we can choose representatives of the sets. More formally, if A; 

is a set for each i in some index set J, then there exist elements a; for i € I such 

that a; € A; for each i € I. Now these elements are described by a function 
from J to the union of the sets A;; this function may be present in some models 
but not others. 

Gödel showed that the Axiom of Choice is consistent (it cannot be disproved 
from the other axioms). He did this by constructing a model or universe in which 
the collections of elements which can be gathered into a set at any stage are those 
satisfying some formula of logic (this is called the ‘constructible universe’), and 
showing that the Axiom of Choice holds in this model. Later, Cohen showed by a 
technique known as ‘forcing’ that it is independent (it cannot be disproved either). 

Since there is no way of resolving questions like ‘Is the Axiom of Choice true?’ 
on the basis of the standard axioms, the only hope of progress is to try to refine our 
intuition about what set theory is, until perhaps there is general agreement about the 
need for a new axiom which would decide some of these questions. In the meantime 
we explore consequences of these statements and of their negations.’ Many of these 
consequences are of a combinatorial nature. 

Now what about counting? Corresponding to the ‘stages’ in the construction of 
sets, we define a transfinite sequence of numbers, the ordinal numbers, as follows. 
The empty set is an ordinal number (the number 0); if the set n is a number then so 
is n U {n} (this number, representing n + 1, will be constructed at the stage after n); 
and, to enable us to leap up into the transfinite, a ‘transitive set’ of ordinal numbers 
(containing all members of its members) is itself an ordinal number. This condition, 
for example, allows us to gather up all the natural numbers into a single ordinal 
number, the first infinite ordinal, conventionally called w. (w is a transitive set, 
since by construction the members of any ordinal number are the smaller ordinal 
numbers.) 

In the construction, we distinguish three kinds of ordinals: 

e zero, or Ú; 
e successor ordinals, of the form n+1=nU {n}; 
o limit ordinals, with no immediate predecessor, obtained by the ‘gathering up’ 
procedure. 
Now we can say that the ‘stages’ of the intuitive construction of sets are indexed by 
the ordinal numbers. 

Having defined the ordinal numbers, we have (almost by definition) the principle 

of transfinite induction: 


5 We do mathematics with the Axiom of Choice or with its negation, in much the same way that we 
do Euclidean or non-Euclidean geometry. 
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(19.1.1) Transfinite induction 
Suppose that P is a property of ordinal numbers. Assume 
e P(0) holds; 
o if P(n) holds, then P(n + 1) holds; 
e ifn is a limit ordinal and P(m) holds for all m < n, then P(n) 
holds. 
Then P(n) holds for all ordinal numbers n. 


Transfinite induction can be used in constructions as well as proofs, just as the 
more usual induction in Chapter 2. 

Ordinal numbers capture the notion of succession. But they don’t measure 
the size of a set. Hilbert’s hotel® illustrates this. Consider a hotel with w rooms 
(numbered 0, 1, 2, ...). One day, when all the rooms are full, a new guest arrives. 
To accommodate him, the manager simply moves each guest into the next room 
along, freeing room 0 for the newcomer. Next day, infinitely many new guests arrive. 
Undetered, the manager shifts the guest from room n into room 2n for each n, 
freeing the odd-numbered rooms for the new arrivals. 

As we saw already, two sets have the same cardinality if there is a bijection 
between them.” Hilbert’s hotel shows that there is a bijection between w and w +1, 
and also between w and w+w. So the ordinal numbers are too discriminating. There 
are two ways to proceed: 

We may decide that, having defined what it is for sets to have the same 
cardinality, we have implicitly defined the cardinality of a set. Roughly speaking, 
cardinalities are equivalence classes for the relation ‘same cardinality’; but care is 
required, since the equivalence classes are not sets (by the same reasoning as in 
Russell’s paradox; singletons, for example, continue appearing at all stages). 

An alternative approach depends on the fact: 


Any non-empty set of ordinal numbers has a least element. 


(This is proved by transfinite induction in the same way that the same assertion 
for the natural numbers is proved by induction — see Chapter 2.) Now, given any 
set X, the set of all those ordinal numbers which are bijective with X has a least 
element (if there are any such numbers!), and we take this least element to be the 
cardinality of X. In other words, a cardinal number is an ordinal number which 
is not in one-to-one correspondence with any smaller ordinal number. With this 
approach, all natural numbers, and w, are cardinal numbers, but w + 1 and w +w 
are not. 


6 As described in Stanislaw Lem’s story ‘The Interstellar Milkman, Jon the Quiet’. See N. Ya. 
Vilenkin, Stories about Sets (1965). 
7 Thus, a set is countable if and only if it is bijective with N. 
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In this approach, a set is finite if and only if it is bijective with some natural 
number. This is precisely how natural numbers are used in ordinary counting (as 
‘standard sets’ of each possible size); our approach generalises this to the transfinite. 

An alternative notation for cardinal numbers is due to Cantor, the ‘aleph 
notation’, (N (aleph) is the first letter of the Hebrew alphabet.) Using transfinite 
induction, we define N, for all ordinal numbers n by the rules: 

@ Xo =a; 

è Rn4ı is the next cardinal number after Nz; 

o if a is a limit ordinal and X, is defined for all b < a, then Ñ, is the least cardinal 
number exceeding all these. 

Life is simplified by the following theorem of Zermelo: 


(19.1.2) Well-ordering Theorem. The Axiom of Choice is equivalent to the assertion 
that every set admits a one-to-one function onto some ordinal number, 


Thus, if we assume that our set theory satisfies the Axiom of Choice (as is almost 
universally done), then every set has a unique cardinal number. 

Cardinal numbers, being special ordinal numbers, are totally ordered. We have 
a < b if there is a one-to-one function from a set of cardinality a into one of 
cardinality b. It follows from (19.1.2) that, assuming the Axiom of Choice, given any 
two sets, there is a one-to-one function from one to the other (in some order!) 

We can do arithmetic with cardinal numbers. If A and B are disjoint sets with 
cardinalities a and b respectively, then a+b, a-b and o are the cardinakities of AUB, 
A x B (Cartesian product), and AP (the set of functions from B to A) respectively. 
(Representing subsets of B by their characteristic functions, we see that 2 is the 


cardinality of the power set of B.) But the rules are a bit different. The next result 
assumes the Axiom of Choice. 


(19.1.3) Proposition. (a) If a and b are infinite, then a+ b= a-b = max(a, b). 
(b) If a > 1, then aè > b for all b. 


In particular, 2* > a for all a. It is known that 2” is the cardinality of the set of 
real numbers. Cantor’s continuum hypothesis is the assertion that 2” is the smallest 
uncountable cardinal number; in other words, no subset of R has cardinality strictly 
between those of N and R. (In aleph notation, 2e = Xi.) More generally, the 
generalised continuum hypothesis (GCH) asserts that, for any cardinal number b, 2° 
is the next cardinal after b (or 2%" = Ra} for all ordinals n). It is known that the 


GCH is undecidable; it holds in Godel’s constructible universe, but there are models 
in which it is false. 


19.2. Konig’s Infinity Lemma 


The Axiom of Choice (which I will abbreviate to AC) is fundamental to most infinite 
combinatorics, and I will assume that it holds. However, some students, learning 
about it for the first time, overrate its influence, and worry that it is being invoked in 
an argument along the lines, ‘The set X is non-empty, so choose an element z € X 


... This does not require AC; one choice, or indeed finitely many choices, are 
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ermitted by the other axioms. Similarly, AC is not Tequired if there is a male in 
making the choices. Only when infinitely many genuine free choices must be 
i i d. > p] 
s A t we have a situation where later choices depend on earlier ones. The 
‘Princi ice’ allows us to make suc. oices; 
- d ‘Principle of Dependent Choice i 
aone of ‘AC. (AC allows us to choose an element from any set which could 
i ise i In this form, it is invoked in proving a res 
conceivably arise in the process.) i nvol prov 
i i : Kénig's Infinity Lemma. 
i ful in applying AC to combinatorics E 
s W one-way infinite path in a digraph D is a sequence vo, ¥1,02,.-- of distinct 
vertices such that (v;, vi41) is an edge for all è > 0. 


(19.2.1) K6nig’s Infinity Lemma. Let vp be a vertex of a digraph D. Suppose that 
ery vertex has finite out-valency; o. 

f a for avery positive integer n, there is a path of length n beginning at vo. 

Then there is a one-way infinite path beginning at vo. 


REMARK. The result is false if condition (a) is relaxed. Take a path of length n for 
every finite n, all starting at the same point, but otherwise disjoint. 


PROOF. We call a vertex v of D good if, for every n, a path of length n starts at v. 


5 . bd r 
We daim: is good, then there exists v' such that (v, v") is an edge and v 


is a good vertex of D — v. 
For let w, be the next point after v on a path of length n starting a A an wa) 
are only finitely many vertices z for which (v,z) is an edge, one of em d y v 
y t occur infinitely often as wn. This means that there are arbitr y long inite 
paths starting at v', and hence paths of all finite lengths starting there, non 
i tain v. Da 
wacom, by ascumption, vo is good. For each i > 0, choose vi} 50 that (+, via) 
is an edge and v;41 is a good vertex of D — {vo,..., di}. Then vo, 01, ¥2)--- 
i infinite path. 
required one-way infinite p l 
Another infinite principle was invoked in the proof of the Claim above: an 
infinite form of the Pigeonhole Principle (cf. (10.1.1)). 


i inci infinite form) 
19.2.2) Pigeonhole Principle (infini 
If infinitely ee objects are divided into finitely many classes, then 
some class contains infinitely many objects. 


The infinite form of Ramsey’s Theorem is a generalisation of this; see Section 


19.4. . 
Now we give an application of Kénig’s Infinity Lemma, showing how it can be 


used to transfer information between the finite and the (countably) infinite. 


i i choose 
rawer contains infinitely many pairs of shoes and we must 


’ le: If a di y pai à p 
: ia a Rasa vain, We can take all the left shoes. But, for infinitely many pairs of socks, AC is 
one 


required. 
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(19.2.3) Proposition. Let [ be a countably infinite graph. Suppose that any finite 
induced subgraph of T has a vertex colouring with r colours. Then T has a vertex 
colouring with r colours. 


PROOF. Let v1, v2,... be the vertices of T. For each n, let C, be the (non-empty) set 
of all vertex colourings of the induced subgraph on {v,,... Va} with the r colours 
1,...,7. Form a digraph D with Un>o Cn as vertex set (we take Cy to be a singleton 
whose only member co is the empty set!) and with edges as follows: for cn € Cy 
and cny1 € Cryi, let (en, cn41) be an edge if and only if cn is the restriction of the 
colouring cnp to the vertices v;,...,v,. Then each vertex has out-valency at most 
r (since at most r colours can be applied to vps; if v1,..., Un are already coloured). 
Moreover, d(co, cn) = n for all cp € Cy. 

So the hypotheses of Konig’s Infinity Lemma are satisfied, We conclude that 
there is a one-way infinite path co,c1,¢2,.... This gives us a rule for colouring all 
the vertices of T; for v, is assigned a colour in c,, and by definition it gets the 
same colour in all cm for m > n. Moreover, it is a legitimate vertex colouring; 
for, if {v;,v;} is an edge, then v; and v; are assigned different colours in Cn, Where 
n = max(i, j). 


(19.2.4) Corollary. Any plane map, finite or infinite, can be coloured with four 
colours. 


Proor. For finite maps, of course, this is the Four-colour Theorem (18.6.9). A plane 
map has at most countably many countries, since each country contains a point 
with rational coordinates, and there are only a countable number of such points. So 
the infinite case follows from (19.2.3). 


In fact, (19.2.3) holds for arbitrary infinite graphs, not just countably infinite 
ones. To prove this, we need a stronger principle, Zorn’s Lemma, to be described in 
the next section. 


19.3. Posets and Zorn’s Lemma 


One of the most striking differences between finite and infinite posets is that the 
latter need not have maximal elements, as shown by the natural numbers (for 
example). An important theorem giving conditions under which maximal elements 
exist is Zorn’s Lemma: 


(19.3.1) Zorn’s Lemma. Let P be a non-empty poset. Suppose that every chain in 
P has an upper bound. Then P has a maximal element. 


Proor. Recall how we showed that every finite poset has a maximal element: if not, 
pick an element, and repeatedly pick a larger element, yielding an infinite ascending 
chain. The same trick works here. Suppose that P — (X, <) has no maximal element. 
By transfinite induction, define elements z4, for all ordinal numbers a, such that 
Ta < £e for a < b. This is done as follows: 

o Let zo be any element of z. 
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e If z, is defined, let za}, be any strictly greater element (this exists since x, is 


not maximal}. 
e If a is a limit ordinal, then the elements x, for b < a form a chain; let za be an 


upper bound for this chain. o. 
Obviously, all the elements z, are distinct. But this leads to a contradiction: take a 
to be a cardinal number greater than the cardinality of X, and there are not enough 
elements available in X for such a chain! 

Note that we used the Axiom of Choice in this proof: we have to choose each 
term of the series from a set of ‘admissible’ elements, This is in fact inevitable: 
Zorn’s Lemma is ‘equivalent to’ AC; the latter can be proved from the former and 
the other axioms of set theory. (See Exercise 3.) 


Here is a fairly typical application of Zorn’s Lemma, to an infinite version of 
(12.2.1): 


(19.3.2) Theorem. Any poset has a linear extension. 


Proor. Let (X, R) be a poset. We let R be the set of relations R’ D R for which 
(X, R') is a poset, partially ordered by inclusion. We claim: 


Every chain in (R, C) has an upper bound. 


For let C be a chain, and let R' be the union of the members of C (each member 
of C being a relation on X, that is, a set of ordered pairs). Then (X, R’) isa partial 
order. (This involves checking the axioms. The arguments are all similar: here is 
the proof of transitivity. Suppose that (x,y), (y,z) € R’. Then, say, (z, y) € Rı and 
(y, z) € Ry for some Ri, R2 € C. Since C is a chain, one of these relations contains 
the other; say Ry C Rz. Then (z,y),(y,z) € R2; so (x, z) € Ry (because (X, R2) is a 
poset), and (x, z) € R’, as required.) Clearly R’ D R, and R'is thus an upper bound 
rC in R. 
f By Zom’s Lemma, there is a maximal element of R, say R. We show that 
(X, R') is a total order. If it were not, then there would be some pair (a,b) of points 
which are incomparable in (X, R’). Now exactly the same argument as in the proof 
of (12.2.1) shows that we could enlarge R’ to make a and b comparable, by setting 
R’ = R' U {} a x T b). But this would contradict the maximality of R’. 
So (X, R’) is a linear extension of (X, R), as required. 
Zorn’s Lemma is often conveniently applied in the form of the Propositional Compactness 
Theorem, which we now develop with an application. 
An ideal in a lattice is a non-empty down-set which is closed under taking joins. Equivalently, I 
is an ideal in Z if 
o0Ee7; 
exyeloaevyel; 


exE€laelL > r^aEl. Sa f 
An ideal is proper if it is not the whole of L; equivalently, if it does not contain 1. 


(19.3.3) Proposition. Any lattice contains a maximal proper ideal. 
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Proor. Straightforward application of Zorn's Lemma to the set of proper ideals, partially ordered 
by inclusion. (If no ideal in a chain contains 1, then the union doesn’t contain 1 either.) 


Slightly more generally, any proper ideal I in a lattice is contained in a maximal ideal. This is 
proved by modifying the argument to use only the set of proper ideals containing I. 


Our application depends on the following observation: in a Boolean lattice £, if I is a maximal 


ideal, then for each a € L, exactly one of a and a’ belongs to F. (They cannot both belong, since 
their join is 1. If neither lies in J, then the set 


J={y:ySaVa for some z € I} 


is an ideal containing J and a but not a’, contradicting maximality.) 


Recall the definition of propositional formulae and valuations from Section 12.4. One small 


piece of terminology: A set E of propositional formulae is satisfiable if there is a valuation v such 
that v(ġ) = TRUB for all ¢ € E. 


(19.3.4) Propositional Compactness Theorem. Let © be a set of Propositional formulae, Suppose that 
every finite subset of X is satisfiable. Then 5 is satisfiable. 


Proor, We work in the Boolean lattice L of equivalence classes of formulae, and identify a formula 
with its equivalence class, Let 7 be the ideal generated by D’ = {(-¢) : 9 E E}: that is, Z is the set of 
elements of L which lie below some finite disjunction of elements of £. The hypothesis implies that 
1 ¢ I. For, if 1 € J, then 1 would be a (finite) disjunction of elements of X’. By assumption, there is 
a valuation giving all these elements the value FALSE; but then 1 would have the value rause, which 
is impossible. 


By the extension of (19.3.3), there is a maximal ideal 7” containing J. Now define a valuation v* 
by 


(g= hee if¢¢g i, 


FALSE if ¢e I*. 
Check that v really is a valuation; clearly v(=) = TRUP, 
The Propositional Compactness Theorem is a more powerful tool than König’s Infinity Lemma, 


allowing arguments to be extended to arbitrary infinite cardinality, as we'll see shortly. It is in fact 


less powerful than the Axiom of Choice: there are models of set theory in which AC fails but 
Propositional Compactness is true. 


As an application, we extend (19.2.3) to arbitrary infinite graphs. 


(19.3.5) Proposition. Suppose that every finite subgraph of T has a vertex colouring with r colours. 
Then T has a vertex colouring with r colours. 
Proor, We take the set 
{pez : £ a vertex of T, 7=1,...,7} 
of propositional variables. Let E be the set of formulae of the following types: 


+ for each vertex z of T, a formula asserting that Pz,i is true for exactly one value of i; 


e for each edge {z, y} of T, a formula asserting that Pz, and py; are not true for the same value 
of 7. 


For example, if r = 3, these formulae would be 


(Pr,1 V Pz,2 V Pra) A (7(Pe,1 A Pa,2) A (Psa A Poa) A ~par? A pe3)) 
and 


“(Pr A Py.) A A(Pz,2 A Py,2) A a(Pz,3 A Py,3) 
respectively. 
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19.4. Ramsey theory 


The infinite form of Ramsey’s Theorem can be stated as follows. 


(19.4.1) Ramsey’s Theorem (infinite form) afinite 
Suppose that k and r are positive integers, and let X be an infinite 
Su ose that the set of k-element subsets of X are par titioned 
into r prn Then there is an infinite subset Y of X, all o 
in . 
k-element subsets belong to the same class. 


1 is the infinite form of the Pigeonhole Principle 


For example, the case k = 2; the general case is an exercise (with hints — 
, 


(19.2.2). I will give a proof for k = 


; imply choose a 
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Theorem. 
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The remainder of this section concerns possible infinite extensions or quantifica- 
tions of Ramsey’s Theorem. The proofs are sketched or omitted; you should regard 
it as a Project. 


There are three natural ways in which we could try to extend Ramsey’s Theorem: 
(a) quantify the two infinities in the statement (aa infinite cardinala); 
(b) allow infinitely many colours; 
{c) colour subsets of infinite size. 
These three will be considered in turn. 


(a) QUANTIFYING THE INFINITIBS. For simplicity, we assume that k = r = 2. Here are one positive and 
one negative result. 


(19.4.2) Theorem. (2) Let a be an infinite cardinal. If |X| > 2°, and the 2-subsets of X are coloured 
with two colours, there must exist a monochromatic set of cardinality greater than a, 
(b) The 2-subsets of R can be coloured so that no uncountable set is monochromatic. 


(Since |R| = 2%, part (b) says that the result of (a) is beat possible for a = No. In the notation 
of Chapter 10, R(2, 2,81) is the next cardinal after 2*0.) 


I won't prove (a) — for the proof, which is not difficult, see for example Ramsey Theory, by R. 
L. Graham et al. (1990) — but the construction for (b) is quite easy. It depends on the following 
fact. Let a family (x4) of real numbers indexed by ordinal numbers be given, and suppose that, if 
a < b, then £a < x. Then the family is at most countable. For there is a ‘gap’ between £, and the 
next number in the sequence, and this gap (an interval of R) contains a rational number ga. All these 
rationals are distinct. The result follows since there are only countably many rationals. 

Now, by the Axiom of Choice, there is a bijection between R and an ordinal number. Let za be 
the real corresponding to the ordinal a, For a < b, colour {za, £b} red if £a < x5, blue if zy > Tè 
Now, according to the last paragraph, a monochromatic red set is at most countable; the same holds 
for a monochromatic blue set, by reversing the order of R in the argument. 


(b) INFINITELY MANY COLOURS. There are two different directions possible here. The first is a simple 
extension, illustrated by the following negative result: 


(19.4.3) Theorem. The 2-subsets of a set of size 2° can be coloured with a colours without creating 
a monochromatic triangle. 


Proor. We take our set of size 2° to be the set of all functions from the ordinal number a to {0,1}. 
Now, for each 6 € a, we colour the pair {f, g} with colour b if b is the smallest point at which f and 
g disagree. Now there cannot be three functions pairwise disagreeing at the same point! 


To motivate the other approach, we have to return to the basic philosophy of Ramsey theory, as 
expressed in the phrase ‘complete disorder is impossible’. We expect that, if an infinite set carries an 
arbitrary colouring, there should be an infinite subset on which the colouring is particularly simple. 
With only finitely many colours, ‘simple’ has to mean ‘Monochromatic’; but in general there are other 
possibilities, for example, all the colours may be different! This leads to so-called ‘canonical’ forms 
of the theorems, first developed by Erdős and Rado. For example: 


(19.4.4) Canonieal Pigeonhole Principle. If the elements of an infinite set are coloured with arbitrarily 
many colours, then there is an infinite subset in which either all the colours used are the same, or all 
the colours are different. 


This is clear because, if the first alternative fails, then each colour appears only finitely often, so 
infinitely many colours must be used; and using AC we can choose one point of each colour. 


Erdős and Rado proved the canonical Ramsey theorem (sometimes called the Erdés—Rado 
Canonisation Theorem. Here is the formulation for k = 2. 
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(19.4.5) Erdés-Rado Canonisation Theorem, case k = 2. Suppose that the 2-subsets of N are coloured 
with arbitrarily many colours. Then there is an infinite subset Y in which one of the following 
alternatives holds (where, in each pair {x,y}, we assume that x < y): 

e all colours are equal; 

o {z,y} and {u,v} have the same colour if and only if £ = u; 

o {z,y} and {u,v} have the same colour if and only ify = v; 

all colours are different. 


(c) COLOURING INFINITE SETS. The result here is wholly negative: 


(19.4.6) Theorem. For any infinite set X, there is a colouring of the countable subsets of X with no 
monochromatic subsets. 


Proor. Let P,,(X) denote the set of countably infinite subsets of X. Define two equivalence relations 
on P(X) by 

+ Az B if |AAB| is finite; 

eè A ~ B if |AAB| is finite and even; 
where AAB is the symmetric difference of A and B. , 

Then each ~-class is the union of two ~-classes, so that Y and Y \ {y} belong to different 
~-classes for each y € Y € P,,(X). Choose one ~-class in each ~-class and colour its members red; 
colour the other sets blue. 

Nevertheless, mathematicians are reluctant to call this the end. Two developments are possible. 
Recognising that AC is used in that short proof, they look for positive results in set theory without 
AC; or they allow, not all colourings, but only those which are ‘nice’ with respect to some structure, 


such as Borel sets in a topological space. 


19.5. Systems of distinct representatives 


Hall's Condition is not sufficient for a SDR for a family of sets if finiteness is not 
assumed. Consider the following example: Xo is the set of all positive integers, and 
X; = {i} for all positive integers i. Now X(J) = J if 0 g J, and X(J) is infinite if 
0 € J. But there is no SDR since, whichever number n we choose to represent Xo, 
there will be no possible representative for Xn. 

However, of the two ways we could relax finiteness (allowing infinitely may sets, 
and allowing infinite sets), it is the second which is crucial to the failure of Hall’s 
Theorem. This was shown by Marshall Hall, who proved the following result. 


(19.5.1) Theorem. Let A = (A; : i € I) be a family of finite sets, and suppose that 
|A(J)| > |J] for all finite sets J of indices. Then the family A has a SDR. 


Proor. The simplest proof uses the Propositional Compactness Theorem. Take a set 
of propositional variables p;,2, for all choices of i € I and z € Aj. Let E consist of 
all formulae of the following types: 

è for each i € I, a formula asserting that p,, is true for exactly one z € Aj; 

e for each pair i,j of distinct indices, and each x € Å; N Aj, the formula (=(pi,2 A 
A oie on v satisfying © defines a SDR (z; : i € I), by the rule that z; is the 
unique element z € A; for which v(,i2) = TRUE: the formulae of the second kind 
guarantee that the representatives are distinct. , , 

Now, if £o is a finite subset of 2, and J the set of indices i for which some p;x 18 
mentioned in Ep, then the subfamily (A; : i € J) satisfies (HC), and so has a SDR; 
thus, there is a valuation satisfying Do. Now the result follows by compactness. 
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REMARK. Hall’s proof uses Zorn’s Lemma directly and is considerably more compli- 
cated; you can read it in his book Combinatorial Theory (1989). There is a simpler 
proof in the countable case, using Konig’s Infinity Lemma (see Exercise 5). 


A great deal of work has been done on necessary and sufficient conditions for 
arbitrary families of sets to have SDRs. 


19.6. Free constructions 


One striking difference between finite and infinite combinatorics is that infinite 
objects of some specail kinds are much easier to construct. There is plenty of room 
to manceuvre; we just go on until the construction ‘closes up’. A couple of examples 
wiill illustrate this. 


The first concerns projective plaries (Chapters 7 and 9). A projective plane is an 
incidence structure of points and lines, in which any two points are incident with a 
unique line and any two lines with a unique point, and satisfying a mild condition 
to exclude degenerate cases (there exist four points, no three collinear). All known 
finite projective planes have a rich algebraic structure, depending ultimately on finite 
fields, Infinite planes are not so restricted: 


(19.6.1) Proposition. Any infinite incidence structure of points and lines, in which 
two points lie on at most one line, can be embedded into a projective plane. 


Proor. We begin by adding some ‘isolated’ points if necessary, to ensure that there 
are four points with no three collinear. Now perform a construction in stages as 
follows: 
è at odd-numbered stages, for each pair of points which are not collinear in the 
structure so far, add a line incident with just those two points; 
è at even-numbered stages, for each pair of lines which are not concurrent in the 
structure so far, add a point incident with just those two lines. 

Now, after progressing through the natural numbers, we take the structure 
consisting of all points, lines, and incidences constructed. Given any two points, 
there is a stage at which both have been added to the structure; not later than 
the next stage, a line incident with both of them is added, and no further line 
incident with both will ever appear. The dual assertions hold similarly. So we have 
a projective plane. 


For example, this ‘free construction’ produces planes which do not satisfy Desar- 
gues’ Theorem (9.5.3). (Start with a ‘broken Desargues configuration’, the structure 
shown in Fig, 9.2 with one 3-point line replaced by three 2-point lines.) 

Obviously, the free construction is very flexible and can be adapted to produce 
various other kinds of objects. Sometimes, however, countably many stages are 
not enough, and we need the power of transfinite induction. Here is an example. 
This concerns Steiner triple systems (Chapter 8). A Steiner triple system (STS) has 
blocks of size 3 with any two points in a unique block; a Steiner quadruple system 
(SQS) has blocks of size 4 with any three points in a unique block. Infinite Steiner 
triple systems exist; for example, they can be produced by the free construction 
(Exercise 7). If D = (Y,C) is a SQS and y € Y, the derived system D,, with point set 
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X =Y \ {y} and as blocks all those 3-sets B for which BU {y} is a block of P, is 
a STS (see Chapter 16). Conversely, which STS can be extended to SQS? 


(19.6.2) Proposition. Any infinite STS can be extended to a SQS. 


Proor. Let (X, B) be a STS, which we propose to extend to a SQS (Y,C) by adding 
a point y ¢ X. Then we must have Y = X U {y}, and 


CD {BU{y}: BEB] 


indeed, the set on the right consists of all blocks in C which contain y. 

An n-arc is a set of n points of X containing no block of B. We see that any 
block not containing y must be a 4-arc; indeed, the set of all such blocks is a set of 
4-arcs with the property that any 3-arc is contained in exactly one of them. So the 
extension problem is equivalent to the existence of such a set; and we propose now 
to construct one by transfinite induction. Note first that there are plenty of 4-arcs: 
given any 3-arc, all but three of the remaining points extend it to a 4-arc. 

A short argument with cardinal numbers (Exercise 8) shows that the set of 
3-arcs has the same cardinality as the set X of points. Let this cardinal be m (an 
initial ordinal), and index the 3-arcs as (T, : n < m). Now we perform the following 
construction, over stages indexed by the ordinal numbers up to m. We build a set 
Fn of 4-ares for each n < m as follows: 

o Stage 0: Set Fy = 9. 
© Stage n+1: If Tn is contained in some 4-arc in Fn, then set Fnoi = Fa. Suppose 
not. Then fewer than m 4-arcs have been put into Fn, and they contain fewer 

than m points. Three more points fail to extend T, to a 4-arc. So we can find a 

point z such that T, U {z} = F is a 4-arc and z lies in no member of Fn. Thus, 

no 3-subset of F is contained in a member of Fp. Set Fry: = Fn U {F}. 

è Limit stage n: let F = Uren Fe 

At stage m, we have ensured that every 3-arc lies in a unique member of Fn, 

and the theorem is proved. 


REMARK. Using techniques of logic, it can be deduced from (19.6.2) that only finitely 
many finite STS fail to be extendable. No exampes of non-extendable STS are 


known! 


19.7. The random graph 
I will end this chapter with what I confess is one of my favourite topics in combi- 


natorics. 


(19.7.1) Erdés—Rényi Theorem 
There is only one countably infinite random graph. 


Some explanation is called for. By a random graph I mean one produced by 
the following stochastic process. Fix a set X of vertices. For each 2-element subset 
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{z,y} of X, toss a fair coin;? if it comes down heads, then join z and y by an edge, 
otherwise leave them unjoined. If X is finite, this procedure gives each (labelled) 
graph the same (non-zero) chance of being picked. Moreover, if we are interested in 
unlabelled graphs, we can see that the more symmetric a graph is, the less its chance 
of occurring. [The symmetric group Sym(X) acts on the set of labelled graphs; 
its orbits are the isomorphism classes (the unlabelled graphs), and the stabiliser 
of a graph I is its automorphism group Aut(I). By (14.3.4), the product of the 
probability of a given unlabelled graph and the order of its automorphism group is 
nt/2"-/2, where n = |X|] By contrast, 


there is a countably infinite graph R such that, with probability 1, 
a random countably infinite graph is isomorphic to R. Moreover, 
R has a very large automorphism group. 


It is my contention that this illustrates an important difference between mathe- 
matics and virtually all other subjects. In no other field could such an apparently 
outrageous claim be made completely convincing by a short argument, as I propose 
to give. The claim also illustrates that our intuition about the infinite is likely to be 
caught out very often. 


Probability theory (or measure theory) for infinite spaces resembles the familiar 
finite theory, with a few additions. The significant one here is the concept of a null 
event (or null set), one with probability zero. If an event E has the property that, for 
any € > 0, there is an event E. D E with probability Prob(£.) < ¢, then E is null 
(Prob(Z) = 0). It is an easy exercise to show that the union of a countable set of 
null events is null. [Suppose that En is null for all n > 1. Given € > 0, choose En. 
containing En with Prob(£,,.) < €/2", and set E: = Un>i Ene Then Prob(E.) < ¢, 
and Un>1 En © Eel 


Now we begin on the proof. It depends on the following property (*), which a 
graph may or may not have: 


Given any two finite disjoint sets U,V of vertices, there exists a 
vertex z joined to every vertex in U and to no vertex in V. 


The Erdés~Rényi Theorem follows from the following two assertions: 
1. With probability 1, a random countable graph satisfies (*). 
2. Up to isomorphism, there is a unique countable graph which satisfies (+). 


Proor oF 1. We have to show that the event that («) fails is null, Now there are 
only countably many pairs (U,V) of disjoint finite sets of vertices; so it is enough 
to prove that, for a fixed choice of U and V, the probability that no vertex z exists 
satisfying the conditions is zero. Call a vertex good if it is joined to everything in 


9 In the language of probability theory, tosses of a fair coin are independent, and each outcome of a 
toss has probability + of occurrence. 

10 Finite random graphs are not as unstructured as this discussion might suggest; global patterns 
arise from the local chaos. This will be discussed in the next chapter. 
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U and nothing in V, and bad otherwise. Any vertex z is most likely to be bad; 
the probability of this is 1 — æ, where n = JU U V|. But there are infinitely many 
vertices, and the events that they are bad are all independent. So the probability 
that vertices z,,...,2n are all bad is (1 — a)”. Since this tends to zero as N — 00, 
the assertion is proved. 


Proof oF 2. This illustrates a logical technique called back-and-forth. Suppose that 
I and A are two countably infinite graphs satisfying (*), with vertex sets X= 
{x1,22,...} and Y = {y1,y2,-- .} respectively. We build, in stages, an isomorphism 
0 between them, as follows. At the beginning of any stage, the value of 9 has been 
determined on finitely many points of X. 

At an odd-numbered stage, let z,, be the first point of X on which @ has not 
been defined (that is, the point with lowest index). Let U’ and V’ be the (finite) sets 
of neighbours and non-neighbours of £n respectively on which @ has been defined. 
In order to extend 9 to z,, we must find a point z € Y which is joined to every 
point of 6(U") and to no point of 6(V"). Since A satisfies (+), such a point z exists; 
choose one (for definiteness, the one with lowest index), and set 0(,) = z. 

At an odd-numbered stage, let ym be the first point of Y not in the range of 6. 
Argue as above, using the fact that T satisfies (+), to find a suitable pre-image of Ym. 

After countably many stages, we have ensured that every point of X is in the 
domain of 6, and every point of Y is in the range. (This is the point of going 
back-and-forth; if we only went ‘forth’, we would define a one-to-one map but 
couldn’t guarantee it to be onto.) Moreover, 6 is clearly an isomorphism, and we 
are done. 


The name R stands for ‘random graph’. The proof we have given is an existence 
proof; if an event (such as property (*)) occurs with probability 1, then it certainly 
occurs, so there exists a graph with this property; assertion 2 shows its uniqueness.” 
For Erdés and Rényi, an existence proof was enough; but an explicit construction 
is more satisfactory. R can be produced by a variant of the ‘free construction’ of the 
preceding section: at each stage, add vertices fulfilling all instances of (x) where U 
and V consist of previously constructed vertices. But one can be even more definite. 
A direct construction was given by Rado, whose name is also commemorated by 
the letter R. 

Rado took the vertex set to be the set of non-negative integers. Given z and y, 
where z < y, to decide whether to join to y, we express the larger number y to 
base 2; that is, we write it as a sum Prex 2” of distinct powers of 2. If 27 is one of 
these powers (that is, if z € X), then join z to y; otherwise, don’t join. Property (+) 
is easy: adding an element to U if necessary, we can assume that max(U) > max({V); 
then z = Puey has the required property. 


The graph R has many nice properties. I will describe two of these, known 


11 This shows that, paradoxically, probability theory is an important tool in proving the existence of 
objects, Exercise 12 gives another instance, and a finite example was given in Chapter 10. A related 
concept in topology, Baire category, has similar uses. See J. C. Oxtoby, Measure and Category (1980), 
for many entertaining illustrations. 


19.8, Exercises 323 


as universality and homogeneity. A graph is said to be universal if it satisfies the 
conclusion of the next result. 


(19.7.2) Proposition. Any finite or countable graph is an induced subgraph of R. 


PROOF. We use the machinery of back-and-forth, but going forth only. In other 
words, take the graph A to be R (i.e., to have property (*)), and let [ be any finite 
or countable graph. Proceeding only from I to A (as in the ‘odd-numbered steps’ 
before), we construct a one-to-one map from T to A whose image is an induced 
subgraph of A isomorphic to T. 


A graph I is said to be homogeneous if the following condition holds: 


Let ¢ be any isomorphism between finite induced subgraphs of I. 
Then there is an automorphism 9 of T which extends ¢. 


(19.7.3) Proposition. R is homogeneous. 


Proor. This is again proved by back-and-forth. We take the two graphs I‘ and A 
to be equal to R, but modify the start of the construction: instead of starting with 
no information about 6, we take its initial value to be the given map ¢. Then the 
argument produces an isomorphism from T to A (that is, an automorphism of R) 
which agrees with ¢ on its domain. 


It follows that the automorphism group of R is infinite. For let the vertices 
be {x1,22,...}. Since all 1-vertex induced subgraphs are isomorphic, there is an 
automorphism 6, mapping zı to z, for each n. In fact this group has cardinality 
2%, the same as that of the symmetric group on a countable set; see Exercise 12. 


It can be shown that any countable homogeneous graph which contains all 
finite graphs as induced subgraphs is necessarily isomorphic to R. A much more 
difficult result is a theorem of Lachlan and Woodrow which determines all countable 
homogeneous graphs. 


19.8. Exercises 


1. Prove that the set of finite subsets of a countable set is countable, but that the set 
of all subsets is not. 


2. (a) Use Konig’s Infinity Lemma to show that every countable poset has a linear 
extension. 

(b) Use the Propositional Compactness Theorem to show that every poset has a 
linear extension. 


3. Prove the Axiom of Choice, assuming Zorn’s Lemma. (HINT: consider the set of 
partial choice functions for a family of sets, ordered by inclusion.] 


4. Prove the infinite Ramsey Theorem for all k. [HInt: The proof is by induction 

on k. Follow the argument given, but replacing condition (b) by 

(b} y: € Y, and for all (k — 1)-subsets Z of Y;, the sets {y;} U Z have the same 
colour. 
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— 1 replacing k to construct ¥;.) Now, in the 
the colour of a k-set depends only on its element 
geonhole Principle is essentially 


(Use Ramsey's Theorem with k 


constructed sequence {¥1, Y2; - J col s 
y; with smallest index i. The final application of the Pi 


the same.) l 
5. Use König’s Infinity Lemma to prove the countable version of Hall’s 
for families of finite sets (19.5.1). 


finity Lemma and the Propositional Compactness Theorem 


Ko i 's In . a « -. 
A Drove the finite Ramsey Theorem from the infinite, but the infinite 


Four-colour Theorem from the finite? 
7. Modify the free construction of (19.6.1) to produce infinite Steiner triple systems. 
8. Prove that the number of 3-arcs in an infinite STS is equal to the number of 
points. [HINT: a = a for all infinite cardinals a.] l 

9. Show that, in an infinite projective plane, the (cardinal) number of points on any 


i how that any infinite projective 
ine i al to the total number of points. Hence s 
ot tains ints such that |S N L| = 2 for all lines L. 


bgraph of R if and only 


Theorem 


plane contains a set 5S of po 


10. Prove that a countably infinite graph I is a spanning su 
if T satisfies the following condition: 


for any finite set V of vertices, there is a vertex z joined to no vertex in y. 


is i i its complement. 
11. (a) Prove that R is isomorphic to i 7 
o Prove that R is isomorphic to R — v for any vertex v, and to R À e for any 
edge e. (In other words, R is immune to any finite amount of tampering. 


12. Let S be a set of positive integers. Let T(S) be the graph with vertex set Z, in 
i d y are joined if and only if |z — yl ES. i , 
wig) Prove that the map z | z +1 is an automorphism `f fh tay iengunee a 
ices i i infini lic automorphism}. (In the langua 
tices in a single infinite cycle (a cyc uh , 
Section 14 7 T(S) is the Cayley graph of the additive group of Z with respect to the 
ely 


set S. a S thin 
A Choose $ at random by tossing a fair coin for each positive integer n, puting 


n € S if the result is heads and not otherwise. Prove that, with probability 1, (5) 
is isomorphic to R. i , phism 

duce that F has a cyclic automorphism. 
o ‘Show that an event with probability 1 is uncountable, and deduce that te 
automorphism group of R is uncountable. (Read ‘cardinality Wo for ‘uncounta! 


here and try the resulting harder problem.) 


20. Where to from here? 


This kind of rather highflown speculation is an essential part of my job. 
Without some capacity for it | could not have qualified as a Mobile, and | 
received formal training in it on Hain, where they dignify it with the title of 
Farfetching. 


Ursula K. LeGuin, The Left Hand of Darkness (1969) 


, This final chapter has two purposes. A few topics not considered earlier are 
discussed briefly; usually there is a central problem which has served as a focus 
for research. Then there is a list of assorted problems in other areas, and some 
recommended reading for further investigation of some of the main subdivisions of 
combinatorics. 


20.1. Computational complexity 


This topic belongs to theoretical computer science; but many of the problems of 
greatest importance are of a combinatorial nature. In the first half of this century. 
it was realised that some well-posed problems cannot be solved by any mechanical 
procedure. Subsequently, interest turned to those which may be solvable in principle 
but for which a solution may be difficult in practice, because of the length of time 
or amount of resources required. To discuss this, we want a measure of how hard, 


computationally, it is to solve a problem. The main difficulty here lies in definin 
the terms! ° 


PROBLEMS. 

Problems we may want to solve are of many kinds: anything from factorising a 
large number to solving a system of differential equations to predict tomorrow’s 
weather. In practice, we usually have one specific problem to solve; but, in order to 
do mathematics, we must consider a class of problems. 

For example, from a mathematician’s point of view, finding a winning strategy 
for chess is trivial, since there are only finitely many configurations to consider 
(The laws of chess put an upper bound on the number of moves in a game, and 

e number of possibilities at each move is c i ici 
E played on an ADN e is clearly finite.) So mathematicians define 

For illustration, we consider the following class of problems, known by the term 
HAMILTONIAN CIRCUIT: given a graph I’, does it have a Hamiltonian circuit? We 
saw in Section 11.7 that there is a trivial algorithm which solves this problem, but 
it is extremely inefficient! 

Obviously the ‘complexity’ if the problem depends on the size of the input data 
— bigger graphs will pose harder problems, in general — so we need first a measure 
of the size of the data. We use an information-theoretic measure: the number of 
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bits of information needed to present the data, For example, a graph with n vertices 
can be encoded as n? bits, as follows: number the vertices as v1, v2,- - , Un} then let 


a = fı if u; is joined to v; by an edge; 
i 0 otherwise. 


This is essentially the adjacency matrix. Then represent the graph by the sequence 
& 
11812.. Ayn 21022... 4In--- GQniQn2--. Anne 


Note that this method is somewhat redundant; we know that a, = 0 for all: 
(no vertex is joined to itself), and a;; = aj; for all i,j {joins are not directed), so 
we could encode the same information in half the space. We will not strive for the 
most efficient representation! This leads to an important principle: Our complexity 
measures should be such that (within reason) 


different representations of the input data don’t change the com- 
plexity of a problem. 


Rather than defining ‘within reason’, I'll illustrate with a data representation which 
is unreasonably wasteful. Consider the problem PRIME: given an integer N, decide 
whether it is prime. The integer N could be given as a sequence of N ones; but 
this is ridiculous, since the base 2 representation of N uses only 1 + flog, N| binary 
digits. 
We make one further simplification: we consider only problems with a simple 
yes-no answer, so-called decision problems. The HAMILTONIAN CIRCUIT problem 
is of this form, as is PRIME. A more general problem can be reduced to several 
decision problems using the ‘twenty questions’ principle (Chapter 4). For example, 
the problem ‘What is the longest circuit in the graph G?’ can be solved by a number 
of instances of ‘Does G have a circuit of length 2 k?; only [logan] questions are 
required, where n is the number of vertices. 


RESOURCES. 
The complexity of a problem should be a measure of the computational resources 


needed to solve it. 

There are various resources which have been considered: time, memory space, 
number of processors (in a parallel processing system), etc. In practice, time is 
usually the limiting factor, and I will consider only this one. 

Of course, different computers run at different speeds, and we have to allow for 
this. We standardise by taking the unit of time to be that required for the processor 
to carry out one operation. The effect of processor speeds is not so significant. For 
example, if a computation takes 10°° processor cycles to perform, it doesn’t matter 
whether the computer runs at 1 or 1000 million cycles per second. (There are fewer 
than 10° seconds in a year; and the universe is fewer than 101° years old, according 
to current theory.) 

Different processors can carry out different amounts of work in a single cycle. 
Again, this dictates that our complexity measure should not be too precise: 


processor details should not change the complexity of a problem. 
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The theoretical analysis is based on a Turing machine, almost the most primitive 
machine imaginable. It consists of a ‘tape’ on which information can be written. 
extending infinitely far in both directions (but having all but a finite amount blank), 
and a ‘head’ positioned over the tape so that it can read or write to one location 
on the tape. (The tape and head correspond to the memory and CPU of a real 
computer.) The head can also be in one of a finite number of ‘internal states’, and 
each tape location can have one of a finite number of ‘symbols’ (including ‘blank’) 
written on it. In one cycle, the machine can write a symbol on the tape, change its 
internal state, and move one position left or right. 
The details are not too important. What is important is that! 
e any computation possible on any machine (theoretical or practical) can be 
performed by a Turing machine; and 
e any processor ever made can in a single cycle perform only the equivalent of a 
bounded number of Turing machine steps. 
Thus we define the complexity of a class C of problems to be the function f 
defined by 


the least m such that a Turing machine can solve 
fein) = { any instance of C whose data consists of n bits in 
at most m steps. 


We call two classes C, and Cz equivalent if there is a polynomial p(z) such 
that faln) < fe,(p(m)) and fe(n) < fe,(p(n)). This definition encompasses our 
principles that different data representations and different processor details should 
not alter the complexity of a problem (that is, the resulting complexity measures 
should give equivalent results). So all classes C for which fc(n) is bounded by 
a polynomial in n are equivalent, but are not equivalent to problems which take 
exponentially long to solve. 


ALGORITHMS. 

We haven't specified how the problem is to be solved. The definition of complexity 
presupposes that the most efficient algorithm is used. This means that upper bounds 
on complexity are much easier to prove than lower bounds. To show that the 
complexity of C is at most F(n), we just have to exhibit an algorithm which solves 
any instance of C of size n in at most F(n) steps. But to show that the complexity is 
at least F(n) is much tougher; we have to prove that no algorithm can exist which 
takes fewer than F(n) steps. 

Consider the problem PRIME, for instance. Remember that an integer N is to 
be input in base 2 representation, so that if the input has size n, then N may be 
as large as 2” — 1. If we use ‘trial division’, checking all numbers up to VN to see 
if they divide N, we will take at least 2°/? steps: exponentially many! Using much 
more elaborate number theory, it has been shown that the complexity of PRIME 
doesn’t exceed n°!€!8", Conceivably, it is polynomial in n. 


1 : 
Alan Turing would have argued that these statements hold t i 
wa g wo s hold true for the human brain as well as any 
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THE cLass P. 

We say that the class C of problems is polynomial-time solvable, or belongs to P, if 

its complexity is not greater than a polynomial function of n. As we saw earlier, 

this property does not depend on details about data representation or processors. 
To show that a class C belongs to P, we have to describe an algorithm which 

solves problems in C in polynomial time. This has been done in a number of 

instances, some of which are not at all obvious. For example: 

(a) Given integers M and N, does M divide N? The primary-school long division 
algorithm decides this in polynomial time. 

(b) Given a graph G, is it connected? We saw an algorithm for this problem in 
Section 11.11; it runs in polynomial time. 

(c) The greedy algorithm for a minimal connector (Section 11.3) is polynomial. 


(d) LINEAR PROGRAMMING. This is traditionally solved by the ‘simplex method’. 


Though this is efficient in practice, there ate some contrived problems which it 

takes exponentially long to solve. In the last decade, Khachiyan found a different 

algorithm (the ‘ellipsoid method’) which runs in polynomial time. Subsequently 

Karmarkar found another polynomial-time algorithm. 

Current ‘received wisdom’ is that the problems in C are ‘tractable’ if C belongs to 
P, and are ‘intractable’ otherwise. (In fact, properties which are not quite polynomial, 
such as PRIME, are regarded as ‘tractable’ as well, Large numbers are routinely 
tested for primality by known algorithms of complexity nibelesr, For n = 1000, that 
is, numbers with about 300 decimal digits, log logn is only 7.742.) 


THe crass NP. 
There is an important type of problem for which no polynomial-time algorithms are 
known; but, if a solution is proposed, then its validity can be checked quickly. 

Imagine that you are a travelling salesman with a briefcase full of Hamiltonian 
graphs. Your customers don’t have a quick way of deciding the HAMILTONIAN 
CIRCUIT problem — if they did, they wouldn't be your customers — but they want 
to buy graphs with Hamiltonian circuits. You show them a graph, and tell them 
a Hamiltonian circuit in the graph; they can easily (meaning ‘in polynomial time’) 
check that your claim is correct. 

A class C of decision problems is said to belong to NP if, for any problem in the 
class for which the answer is ‘yes’, there is a ‘certificate’, a piece of information using 
which it is possible to verify the correctness of the answer in polynomial time. Thus, 
an explicit Hamiltonian circuit is a certificate for HAMILTONIAN CIRCUIT, showing 
that it belongs to NP. 

The letters NP stand for ‘non-deterministic polynomial’, deriving from another 
way of viewing this concept. A class C belongs to NP if a problem in C of size n 
can be solved in time which is polynomial in n by a program which is allowed to 
make some lucky guesses. (You can find a Hamiltonian circuit in polynomial time 
by guesswork, if you are lucky!) 

The class P is contained in NP: problems in P can be solved quickly without 
recourse to certificates or guesswork, Many other classes of problems are in NP: 
HAMILTONIAN CIRCUIT (as we've seen), GRAPH ISOMORPHISM (deciding whether 
two graphs are ‘the same’), SATISFIABILITY (does a Boolean formula take the value 
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TRUE for some choice of values for its variables?), DECODING A LINEAR CODE. 
(Encoding a linear code is in P, since it is simple linear algebra.) It is true, though 
not obvious, that PRIME belongs to NP. But the opposite problem, COMPOSITE 
(i.e. is N composite?) is clearly in NP. (A certificate for the compositeness of N 
is a proper factor M of N: we saw that divisibility can be checked in polynomial 
time.) The main open problem is: 


Is P £ NP? 


NP-COMPLETENESS. 

We say that a class C, is reducible to a class C2 if, given a problem in C, (with data 
of size n}, we can compute the data for a problem in C2 with the same answer, in a 
time which is polynomial in n. Thus, if C, is reducible to C2, and if C2 belongs to P 
(or NP, respectively), then so does C. Intuitively, it means that problems in C, are 
no harder than those in C2. 

Stephen Cook proved in 1969 that NP contains a class C of problems such that 
any class in NP is reducible to C. The class he gave was SATISFIABILITY of Boolean 
formulae. A class with Cook’s property consists of ‘the hardest problems in NP’, in 
the sense that if a polynomial-time algorithm for such a class were ever found, then 
it would follow that every problem in NP would have a polynomial-time solution, 
that is, that P = NP. Such a class is called NP-complete. 

Since Cook’s work, hundreds of classes have been shown to be NP-complete, 
including HAMILTONIAN CIRCUIT and DECODING A LINEAR CODE. 

In summary, then, we regard a class of problems as ‘easy’ if it lies in P, or 
nearly so; and as ‘hard’ if it is at least as hard as an NP-complete class. (There are 
problems which are much harder than anything in NP; typical examples are finding 
winning strategies in positional games, where we have to consider each possible 
response of our opponent, each response we could make to it, each response of 
our opponent to our move, and so on — the branching tree of possibilities require 
exponential time to analyse.) Other problems, notably GRAPH ISOMORPHISM, are 
in NP but not known to be either in P or NP-complete; it is thought that they 
may lie strictly between these two classes. 


The standard reference on P and NP is Computers and Intractability: A Gutde to 
the Theory of NP-Completeness by M. R. Garey and D. S, Johnson (1979), which lists 
hundreds of problems (many of them combinatorial) together with their classification 
as in P, NP-complete, neither, or ‘don’t know’. 


20.2. Some graph-theoretic topics 


This section presents thumbnail sketches of three topics in graph theory (recon- 
struction, higher regularity conditions, and random graphs), which haven’t been 
mentioned yet. 


GRAPH RECONSTRUCTION. 

The reconstruction problem for graphs (in two versions, one for vertices and one for 
edges) has the fascination of a long-standing open problem, and also has unexpected 
links with other topics. In its original, vertex form, the problem is as follows. Given 
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a graph T with n vertices, construct a ‘deck of cards’; the i'™ card carries a drawing 
or specification of the subgraph of [ obtained by deleting the i'h vertex of I’, for 
i= 1,...,n. Now we ask: can I be reconstructed, up to isomorphism, from the 
information provided by the deck of cards? Graphs on two vertices cannot be 
reconstructed in this way, since each of them has a deck of two cards, each with a 
1-vertex graph on it. However, the vertez-reconstruction conjecture asserts that any 
graph with more than two vertices is reconstructible. 

More formally, call graphs [ and A hypomorphic if there is a bijection ¢ from 
the vertex set of I to that of T such that IT — v and A — ¢(v) are isomorphic, for 
each vertex v of T. The conjecture asserts that hypomorphic graphs with more than 
two vertices are isomorphic. 

The problem is open, but many partial results exist. On one hand, it is known 
to be true for many particular classes of graphs (disconnected graphs, trees, regular 
graphs, etc.). On the other, many properties are known such that, if two graphs are 
hypomorphic and one has the property in question, then so does the other (number 
of induced subgraphs of a particular kind, existence of a Hamiltonian circuit, ete.). 

It is known that the vertex-reconstruction conjecture fails for directed graphs; 
infinitely many pairs of digraphs are known which are hypomorphic but not 
isomorphic? 

There is also an edge-reconstruction conjecture, in which the information given 
is the deck of edge-deleted subgraphs. It is also open. The largest known counterex- 
ample is the pair of graphs T, A on four vertices, where T consists of a triangle and 
an isolated vertex, and A is the ‘star’ Kı,3. There are many partial results, of which 
the strongest is the theorem of Lovász and Müller, which shows that the conjecture 
is true for the vast majority of graphs: 


(20.2.1) Theorem. A graph with n vertices and more than nlog,n edges is edge- 
reconstructible. 


The edge-reconstruction conjecture can be formulated in the language of permu- 
tation groups; the result of (20.2.1) extends to a general theorem about permutation 
groups. 


HIGHER REGULARITY CONDITIONS. 
A graph T is said to be strongly regular, or SR, with parameters (n, k, A, y), if the 
following conditions hold: 

è T has n vertices; 

e T is regular with valency k; 

e any two adjacent vertices have exactly à common neighbours; 

e any two non-adjacent vertices have exactly 4 common neighbours. 

We have seen some examples already. The Petersen graph is SR, with parameters 
(10,3,0,1); indeed, any Moore graph of diameter 2 is SR (Section 11.12). The 
complete bipartite graph K+, is SR with parameters (2k, k,0, k). Many other 
examples exist. On the other hand, various necessary conditions are known for the 


2 The existence of a finite number of counterexamples is not regarded as invalidating a conjecture of 
this kind, merely as modifying its statement. 
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quadruple (n, k, A, 2) to be parameters of a SR graph. A simple counting argument 
shows that k(k — À — 1) = (n — k — 1). (Double-count edges joining a neighbour 
of v to a non-neighbour.) Another powerful condition comes from linear algebra, 
by the argument we used for Moore graphs in Section 11.12. However, there is still 
a wide gap between the known necessary conditions and the sufficient conditions 
(arising from explicit constructions). As a sample question, it is not known whether 
or not a SR graph with parameters (99, 14,1, 2) exists. 

Strongly regular graphs are closely connected with topics in finite geometry (nets, 
partial geometries), design theory, Euclidean geometry, permutation groups, and a 
number of other areas.* 

The definition can be unified and strengthened. For a positive integer t, we say 
that the graph T is t-tuple regular if, for any set S of at most ¢ vertices, the number 
of common neighbours of the vertices in S depends only on the subgraph induced 
on S. For ¢ = 1 and for ¢ = 2, this condition reduces precisely to regularity and 
strong regularity, respectively; and the condition becomes stronger as ¢ increases. 
For large ż, we have the following: 


(20.2.2) Theorem. Let T be 5-tuple regular. Then I is one of the following: a disjoint 
union of complete graphs of the same size; a regular complete multipartite graph 
(i.e, the complement of the preceding); a pentagon; or the line graph of Kss. All 
these graphs are t-tuple regular for all t. 


Another variant is a weakening of the condition of strong regularity, to distance- 
regularity. A connected graph I is distance regular if, given integers j and k, the 
number of vertices at distance 7 from vertex v and & from vertex w depends only 
on the distance ¢ between v and w. (In fact, only a small subset of these conditions 
are required to guarantee the whole set.) A connected graph is strongly regular if 
and only if it is distance-regular of diameter 2. However, it seems that distance- 
regular graphs of large diameter are not so common, and there is some hope of a 
classification of these. See the book Distance-Regular Graphs by A. E. Brouwer et 
al. (1989) for further information. 


RANDOM GRAPHS. 

Twice already (in Chapters 10 and 19) we've met the notion of a random graph, 
whose edges are selected independently with probability (so that all labelled graphs 
are equally likely, if the number of vertices is finite). To develop a theory, we need 
more flexibility! Two models are commonly used. Let n denote the number of 
vertices. 


FIRST MODEL. We choose edges independently with probability p, for some fixed p 
with O<p<l. 


SECOND MODEL. We specify the number m of edges of the graph, and choose the set 
of edges from the (ron $) possible m-sets (all such sets equally likely). 


We examine the behaviour of a ‘typical’ graph as n — 00, where p or m is equal 
to a prescribed function of n. Of course, the two models are not the same; the 


3 A surprising recent connection is with the theory of knot polynomials. 
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second always has exactly m edges, whereas the number of edges in the first has a 
binomial distribution with mean pn(n — 1)/2. Nonetheless, it is not too surprising 
that these models behave quite similarly, if m and p are related by m = pn(n—1)/2. 
For the sake of exposition, I'll use the second model. 

One feature of the theory is that, for various properties P of graphs, there are 
sharp thresholds. In other words, there is a function f such that, if m is a bit less 
than f(n), then almost no graphs have property P (that is, the proportion of graphs 
on n vertices which satisfy P tends to 0 as n — 00), and if m is a little greater than 
f(n), then almost all graphs (a proportion tending to 1) satisfy P, We say that P 
holds almost surely if almost all graphs have P. 

Two basic results illustrate these ideas. In both results, we consider random 
graphs with n vertices and m edges, according to the second model. 


(20.2.3) Proposition. Suppose that m ~ en. 

(a) I'O < c < i, then almost all graphs have the property that almost all 
components are trees or unicyclic, the largest component having approximately 
logn vertices. 

(b) If c = 4, then almost surely the largest component has about n?/* vertices. 

(e) If c > 4, then almost surely the largest component has size about c'n, for 
some constant c’ (depending on c). 


(20.2.4) Proposition. Suppose that m ~ cn logn. 
(a) If0 < c < 4, then almost all graphs are disconnected. 
(b) If c > 4, then almost all graphs are Hamiltonian. 


Many similar results are known, and the ‘sharpness’ of the thresholds has been 
greatly improved. One helpful way of describing the results is in terms of the 
‘evolution’ of a random graph, as the number of edges is gradually increased. The 
existence of thresholds shows that the process in some way resembles the ‘punctuated 
equilibrium’ model of biological evolution,* where short periods of rapid change are 
interspersed with long stretches of relative uniformity. 

Bollobas’ book Random Graphs (1985) gives a detailed account. 


20.3. Computer software 


An essential part of the training of a statistician or numerical analyst involves the 
use of ‘standard’ computer software packages. A few years ago, I was asked what 
the equivalent packages in combinatorics were. I answered, ‘C or Pascal’. 

Today I would still give that answer, though now more by taste than by necessity. 
In a general-purpose programming language, you can do anything; and you do not 
pay the price in overheads associated with translation from one language into 
another, or with ‘user-friendliness’. : 

[If you do combinatorial computing in a general-purpose language, it is very 
important to remember Wirth’s dictum, ‘Algorithms + Data Structures = Programs’. 
The data structures you will use (perhaps large integers, partitions, permutations, 


4 Stephen Jay Gould, Ever Since Daruin (1977), 
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trees, graphs, families of sets) are not usually well represented by the built-in data 
structures of the language (perhaps small integers, floating-point reals, characters, 
and strings of characters). Time spent on designing well-adapted data structures will 
not be wasted.] 


Now it is increasingly common to find specialised systems which are useful 
in combinatorics. This is particularly true for the material of Chapter 14. The 
Schreier-Sims algorithm for getting information about the group generated by 
a set of permutations is quite sophisticated, and can be integrated in a system 
where it might use the output from an algorithm for finding generators for the 
automorphism group of a graph, or from the Todd-Coweter algorithm (which takes 
as input generators and relations for a group G, and generators for a subgroup H, 
and returns permutations generating the action of G on the coset space G : H). 
The output from the Schreier-Sims algorithm might itself be subjected to further 
group-theoretic ‘analysis. Two integrated systems (both of which have much wider 
capabilities) are the long-established CAYLEY (and its successor MAGMA), and the 
newer and smaller GAP. These algorithms are also making their way into more 
conventional ‘computer algebra’ systems. 

Various packages for combinatorial optimisation are available. Often, these are 
centred around linear programming. However, dramatic new algorithms for finding 
approximate solutions to ‘hard’ (e.g. NP-complete) optimisation problems, such as 
simulated annealing and genetic algorithms, are becoming available. 

Finally, I should mention the language ISETL, designed for the purpose of 
teaching discrete mathematics: see Baxter, Dubinsky and Levin, Learning Discrete 
Mathematics with ISETL (1989). This language handles large integers, sets, sequences, 
functions, etc., with a syntax almost identical to that used by a mathematician." It 
is very easy to learn (no type declarations are required), and is freely available on a 
wide range of personal computers and operating systems. 


20.4. Unsolved problems 


This section presents some further problems which have guided the direction of 
research in the past. Unlike earlier chapters, you are not expected to solve all of 
these. 


GRAPH COLOURING. 
Despite the work of Appel and Haken, and of Robertson and Seymour, this area 
still abounds with hard problems. Here are three: 

e The Strong Perfect Graph Conjecture: a graph T and all its induced subgraphs 
have clique number and chromatic number equal if and only if neither G nor 
its complement contains an induced cycle of odd length greater than 3 (see 
Section 18.4). 


S For example, in ISETL, one can define the Cartesian product of sets A and B to be 


{ [x,y] : x in A, y in B}. 
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e Hadwiger’s Conjecture: a graph with chromatic number n has K, as a minor. 
[This would imply the Four-Colour Theorem, since the class of planar graphs is 
minor-closed and K; is not planar.] 

e The List Colouring Conjecture: Let [ have edge-chromatic number n. Suppose 
that S is any set of ‘colours’ and, for each edge e of I’, a list L(e) of n elements 
of S is given. Then I‘ can be edge-coloured using S, so that the colour of any 
edge e belongs to L(e). 


EXTREMAL Set THEORY. 
Rather than mention specific problems, I will describe how a number of questions 
can be put into this framework. 

If we identify a binary word of length n with the subset of {1,...,n} of which 
it is the characteristic function, then the ‘main problem’ of coding theory over the 
binary alphabet (Section 17.4) takes the form: Given n and d, what is the largest 
family F of subsets of {1,...,n} such that |RAF,| > d for all KF, € F with 
BF, # Fp? 

A permutation of {1,...,n} = N can be regarded as a subset S(x) of the 
square array N x N containing exactly one element from each row or column. The 
number of fixed points of m73" is then |S(1) N S(m2)|. So ‘metric’ questions about 
permutations can be phrased in terms of families of sets of this special form. 


DESIGN THEORY. 
Asking for less than a complete list of parameters of t-designs, we could pose the 
problems: 

è Do Steiner systems S(t, k,v) (or t-(v, k, 1) designs) exist for all ¢? 

e Is there a projective plane whose order is not a prime power? 

e Is there a Hadamard matrix of every order divisible by 4? 

One can define g-analogues of ¢-(v, k, A) designs: the blocks are k-dimensional 
subspaces of a v-dimensional vector space over GF(q), and any ¢-dimensional 
subspace lies in exactly À blocks. Do non-trivial g-ary t-designs (without repeated 
blocks) exist for all ¢? Or even for t = 4? 


PoseTs. 
A question that continues to tantalise is the 1.2 problem. If z and y are incomparable 
elements of the poset P, let m(x, y) be the proportion of linear extensions of P in 
which z < y. Is it true that every poset which is not a chain contains elements z 
and y with 1 <a(z,y) < 2? 

The problem of finding the cardinality of the free distributive lattice on n 
generators was mentioned in Section 12.3. 


ENUMERATION. 

A glance at the Handbook of Integer Sequences, by Neil Sloane, shows a number 
of combinatorial counting sequences of which only a few terms are known. Each 
poses the problem of finding either a general formula or more terms. Sloane himeelf 
mentions several, including various kinds of ‘polyominoes’, non-attacking queens, 
polytopes, Latin squares, linear spaces (families of sets with any two points in a 
unique set of the family), and knots. 


20.5. Further reading 


MISCELLANEA. 
o Is there a perfect 1-error-correcting code over an alphabet not of prime-power 
size (Section 17.6)? 
e Is there a Moore graph of diameter 2 and valency 57 (Section 11.12)? 
e A conjecture of Isbell: Let n = 2°) with b odd. If a is sufficiently large compared 
to b, then an intersecting family of subsets of {1,...,2} with cardinality grt 
cannot be invariant under a transitive group of permutations of {1,... nr}. 


20.5. Further reading 


I expect that, if you’ve read this far, you are feeling that on some topics I stopped 
just as things were getting interesting, while on others I said more than anyone 
would reasonably want to know. But I can’t predict which topics will fall into which 
class. This section should help you explore further. 

The authors of a book have to make some compromise between coverage and 
exposition. The result will lie at some point on the scale between light bedtime 
reading and an encyclopedia. Where I list two books, I have tried to put the 
textbook before the reference book. 


GENERAL. 

There are a number of general combinatorics books. Those which go beyond the 
introductory material tend to reflect their authors’ interests. For example, M. Hall's 
Combinatorial Theory (1986) is strong on codes and designs, as well as the asymptotics 
of the partition function and the proof of the Van der Waerden Conjecture. Other 
books include L. Comtet, Advanced Combinatorics (1974), J. Riordan, An Introduction 
to Combinatorial Analysis (1958), H. J. Ryser, Combinatorial Mathematics (1963), and 
the recent book by J. H. van Lint and R. M. Wilson, A Course on Combinatorics. 

The Handbook of Combinatorics, due out soon, will contain commissioned sur- 
veys on all parts of combinatorics. Another good source of more specialised surveys 
is the Proceedings of the biennial British Combinatorial Conference. Speakers are 
invited to survey their subject areas, and the papers are published in advance of the 
meetings. These have appeared in the London Mathematical Society Lecture Note 
Series since 1981, and most topics have been covered. 

Another very useful book is L. Lovász’s Combinatorial Problems and Exercises 
(1979), a vast collection of problems ranging from routine exercises to major theo- 
rems. The book is in three parts: Problems, Hints, and Solutions; the third part is 
the longest! 


ENUMERATION. 

F. Harary and E. M. Palmer, Graphical Enumeration (1973) is a good exposition; I. P. 
Goulden and D. M. Jackson, Combinatorial Enumeration (1983) contains everything 
you need about manipulating generating functions. 

One book which is indispensible to enumerators is N. Sloane’s A Handbook of 
Integer Sequences (1973) (a new edition is under way). This is a list of over 2000 
sequences in lexicographic order, with detailed bibliographic information about each 
one. If the sequence you've just discovered counting ultra-hyperbolic flim-flams has 
been found before, chances are you'll find it in here, with references. 
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SYMMETRIC FUNCTIONS. 

This subject connects with enumeration on one side and with representation on the 
other. R. P. Stanely’s Ordered Structures and Partitions (1972) is on the combina- 
torial side; I. G. Macdonald’s Symmetric Functions and Hall Polynomials (1979) is 
condensed but clear. 


FAMILIES OF SETS. 
L Anderson, Combinatorics of Finite Sets (1987) and B. Bollobas, Combinatorics 


(1986) are recommended. 


TRANSVERSAL THEORY (SDRs). 
L. Mirsky, Transversal Theory (1971); L. Lovász and M. D. Plummer, Matching 


Theory (1986). 

RAMSEY THEORY. o 
R. L. Graham, B. L. Rothschild and J. Spencer, Ramsey Theory (1990), is wide- 
tranging and readable. P. Erdős et al Combinatorial Set Theory (1977), is slanted 
towards the infinite. 


LATIN SQUARES. . 
See J. Dénes and A. D. Keedwell, Latin Squares and their Applications (1974). 


DESIGN THEORY. 
Two books with the title Design Theory, both appearing in 1985, are the textbook 
by D. R. Hughes and F. C. Piper, and the tome by T. Beth, D. Jungnickel and H. 


Lenz (which gives details of many recursive constructions), 


GEOMETRY. 

The classics are É. Artin, Geometric Algebra (1957), and J. Dieudonné, La Géometrie 
des Groupes Classiques (1955). A more recent account is in my lecture notes, 
Projective and Polar Spaces (1992), available from the School of Mathematical 
Sciences, Queen Mary and Westfield College. For finite projective geometries, the 
three-volume series by J. W. P. Hirschfeld, Projective Geometries over Finite Fields 
(1979, 1985, 1991 — the last with J. A. Thas), is definitive. 


PERMUTATION GROUPS. 
H. Wielandt, Finite Permutation Groups (1964), and D. S. Passman, Permutation 
Groups (1968). T. Tsuzuku, Finite Groups and Finite Geometries (1982) and N. L. 
Biggs and A. T. White, Permutation Groups and Combinatorial Structures (1979) deal 
particularly with the relations to combinatorics. No satisfactory account of the 
situation post-Classification (of finite simple groups) has appeared. 

A class of infinite permutation groups with particular links with combinatorics 
are discussed in my book Oligomorphic Permutation Groups (1990). 


CODES. 

Start with R. Hill, A First Course im Coding Theory (1986) or J. H. van Lint, 
Introduction to Coding Theory (1982). The encyclopedia is F. J. MacWilliams and 
N. J. A. Sloane, The Theory of Error- Correcting Codes (1977). For the relation with 
information theory, see C. M. Goldie and R. G. E. Pinch, Communication Theory 
(1991); for cryptography, see D. J. A. Welsh, Codes and Cryptography (1988). The 
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title of P. J. Cameron and J. H int’ i 
self cxpleaator . H. van Lint’s Designs, Graphs, Codes and their Links is 


ORDERS. 
B. A. Davey and H. A. Priestley, Introduction to Lattices and Order (1990) 
Matros. 


V. W. Bryant and H. Perfect, F 
. , Independence Theory in Combinatori 
scene for the comprehensive treatment by D. J. A. Welsh, Mencia Theowy 1978), 


GRAPH THEORY. 


There i . : 

There is 2 wide choice of books at many levels. R. J. Wilson’s Introduction to Graph 
eens is juet that. For more specialised topics, see B. Bollobás, Extremal Graph 
The K (1978); B. Bollobás, Random Graphs (1985); A. E. Brouwer, A.M Cohen 

. maier. - oks 
and A N , Distance-Regular Graphs (1989); or several of the books referred 
F : 
cena loin St surveys mre by L. W. Beineke and R. J. Wilson, three on 
rap. eory (1974, 1977, 1988) and icati 

Theory (1979), give a wide coverage of the e one on Applications of Graph 


[NFINITE COMBINATORICS. 


No text-book is devoted exclusi i 
usively to this. Most books include it i i 
A dey e . e it 
Poso reri D. König’s early classic Theorie der endlichen und unendlichen Graphen 
eprint) is not prejudiced towards the finite. A recent conference proceedings 


edited by R. Diestel, Directions in Infini 
1 . nfinite G $ . 
zdited by R. Diest b, Direction ae ite Graph Theory and Combinatorics (1991), 


Answers to selected exercises 


CHAPTER 2, EXERCISE 5. There are 80 unlabelled families. 


CHAPTER 2, Exercise 12. (i) Let m = ao + 201 + --- 4 2¢-1@4_,. Numbering the first row as zero, 
the two entries in the i row are a; + 2aiga +--+ 9¢-1-ig,_, and 2'n. The first of these is odd if 
and only if a; = 1; so we add the values 2'n for which a; = 1, that is, we calculate 5> a;n = mn. 

(ii) If n is written in base 2, then doubling has the effect of shifting it one place to the left; 20 
the terms added are exactly those occurring in the standard long multiplication done in base 2. 

(iii) With these modifications, we replace 2n by n”, so that the final result is indeed n™. The 
method requires 2|log, m4 multiplications at most, viz, (log, mj] squarings and then at most this 
number of multiplications in the last step, since m has 1 + |log, m] digits in base 2. 


CHAPTER 3, EXERCISE 10. 523 words can be made with these letters. 


CHAPTER 3, Exercise 15. We get a 1-factor by writing n/2 boxes each with room for two entries, 
and filling them with the elements 1,...,7- These can be written in the boxes in n! ways. However, 
permuting the boxes (in k! ways), or the elements within the boxes (in 2* ways) doesn't change the 1- 
factor. The product of these numbers is 2.4.6...(2k); dividing, we obtain 1.3.5...(2k—1) = (2k - iit 
for the number of 1-factors. 

(b) Suppose that the k-set A is exchanged with its complement B by a permutation. Then 
elements of A and B alternate around each cycle, which thus has even length, Conversely, if all 
cycles have even length, we may colour the elements in each cycle alternately red and blue; the red 
and blue sets are then exchanged. 

From a permutation with all cycles even we obtain a pair of 1-factors as follows. A 2-cycle is 
assigned to both 1-factors; in a longer cycle, the consecutive pairs are assigned alternately to the 
two 1-factors. The process isn't unique, since the starting point isn't specified; indeed, a permutation 
gives rise to 9¢ ordered pairs of 1-factors, whre d is the number if cycles of length greater than 2. 

Conversely, let a pair of 1-factors be given. Their union is a graph with all vertices of valency 
2, thus a disjoint union of circuits, all of even length (since the 1-factors alternate around a circuit). 
We take these circuits to be the cycles of a permutation. In fact, for a circuit of length greater than 
2, there are two choices for the direction of traversal. So the number of permutations obtained is 24, 


where d is as before. 
(c) The proportion is 


k 
(2k — 1)!)?/(2k)! = Jio- 1/28 


izl 


< fje” 


izl 
= e Eint) 
=e log &/24+0(1) — olk’). 


CHAPTER 3, Exercise 16. 2"°. (a) gn(n-), (b) 2712+12, (c) gn(a~1)/2, (a) 37702. 
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CHAPTER 3, EXERCISE 19. The numbers are (a) 29, (b) 13, (c) 19, (d) 6. (The numbers of unlabelled 
structures are 9, 4, 5, 1 respectively.) 


CHAPTER 4, EXERCISE 10. 


_ [ (41 —1)/3, neven, 
fn) = T - AY n odd. 


CHAPTER 4, ExgrRcIsE 11. (b) The relationship is u(n) — u(n — 1) = a(n)/2 for n > 2. 


CHAPTER 4, EXERCISE 16. Imagine that the clown wears a diving suit, and continues to draw balls 
even after he gets wet. There are E) ways in which the balls could be drawn. The clown stays dry if 
and only if the number of red balls never exceeds the number of blue ones; according to the voting 
interpretation, this is C41. The ratio is 1/(n + 1). 

For a harder exercise, find a direct proof of this exercise, and reverse the above argument to 
deduce the formula for the Catalan numbers. 


CHAPTER 4, Exercise 19. The recurrence can be written as 


n+l 
D (Ti Jee =a for n > 2. 
k=l 


Multiplying by ¢*+!/(n + 1)! and summing over n, the two sides are f(t)exp(t) and f(t) with 
the constant and linear terms omitted. Thus f(t)exp(t) — 1- t+ it = f(t) - 1+ bt, whence 
f(t) = t/(exp(t) — 1), as required. 
Now f(t) + 4 = d(exp(42) +exp(-4t))/(exp(4t) — exp(—42)) = $t coth 32, an even function. 
The last recurrence obviously has the solution 6, = (—1)*. 


CHAPTER 5, EXBRCISE 1. The pollster is either dishonest or incompetent. 


CHAPTER 5, EXERCISE 11. (a) The identity has n cycles, a single cycle just one, and multiplying by 
a transposition changes the number by 1. So at least n — 1 transpositions are required. The proof 
of (5.5.2} shows that this number is achieved if and only if each transposition involves points lying 
in different cycles of the product so far. This is equivalent to saying that the transpositions are 
edges of a connected graph without cycles, a tree (compare the discussion in Section 11.3). There are 
n™—? trees on {1,...,n}, and (n — 1)! orders to choose the n — 1 edges of such a tree. So there are 
n®—?(n— 1)! tree-cycle pairs. Each of the (n — 1)! cycles occurs equally often (they are all conjugate), 
necessarily n*~? times. 


CHAPTER 6, Exenrcis 2. We look for properties of the group multiplication tables unaltered by row 
and column permutations, There are two groups of order 4, the cyclic group and the Klein group. 
The second, but not the first, has the property that given any two rows and any column, there is a 
(unique) second column so that the entries in these rows and columns form a Latin subsquare of 
order 2. This property is preserved by row and column permutations. So the two multiplication tables 
are inequivalent. Since there are only two inequivalent Latin squares, both are group multiplication 
tables. 

There is only one group of order 5, the cyclic group; its multiplication table has the property 
that any two rows ‘differ’ by a cyclic permutation. It is not hard to construct a Latin square of order 
5 which does not have this property. 


CHAPTER 6, EXERCISE 4. (a) {{1,2, 3}, {1, 2}, {1,3}}. 
(b) 24. 


CHAPTER 7, EXERCISE 4. Hint: Use Lucas’ Theorem (Section 3.4). 


CHAPTER 7, EXERCISE 8. (a) The minimal sets of any family form a Sperner family. 

(b) If Z is minimal with respect to meeting every set F/\ F, for F’ € F, F' £ F, then Y = {y}UZ 
has this property. [Why dees such a Z exist?] 

(c) If F € F, then F meets every set of B(F); by (b), if we omit a point of F, this is no longer 
true, so F is minimal. Thus F C 6(b(F)). To prove the reverse inclusion, it suffices to show that a 
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set meeting every member of b(F) contains a member of F, or equivalently, a set Z containing no 
member of F is disjoint from some member of 6(F). But if Z is such a set, then X \ Z meets every 


member of F, so X \ Z contains a member of b(F), as required. oo 
(d) The first part is clear, since an (n + 1—k)-set meets every k-set but an {n — k)-set is disjoint 


from some k-set. 
Fo = {0}, and no set meets 8, so (Fo) = Ô, the empty family. (Note that any set has the 
property that it meets every set in @, and the unique minimal set is 0; so 6(0) = {0}, in accordance 


with (c).] 

CHAPTER 8, Exercise 2.1+2i84 primitive element; its fourth power is 1— z. So coset representatives 
are 1,14 z, (142)? = 3z, and (14+ 2} = 2+2. 

CHAPTER 8, Exercise 11. Let z be a point outside Y. Then z lies in (n — 1) p triples. But a triple 
containing z and a point of Y contains only one point of Y (since a triple with two pointe in Y is 


contained in Y); there are m triples of this form. Thus, (n- 1)/2 2m. ; a 
Equality holds if and only if every triple through any point outside Y meets Y, ie. no triple is 


disjoint from Y. 
CHAPTER 9, EXERCISE 2. There are q” matrices; of these, Int" — q*) are non-singular; so the 


probability is a i 
(3) 
k=l T 


(putting k = n — 1 — i). This is a decreasing function of n, so tends to a limit c(g), which is clearly 
less than 1. Now 


log c(g) = $log (1 - 3) 2 og Mog (=) =- 


since the curve y = log(1 — x) lies above the line segment from (0,0) to (1/q,log((g — 1)/¢)). Se 
efg) > (Cg — 1)/9)/8- > 0. 

CHAPTER 9, EXERCISE 11. Let p1,..-, Ps be five points, no three collinear. Then non-zero vectors 
spanning the first three points form a basis, and the other two points have all their coordinates non- 
zero relative to this basis. (If p4 had its third coordinate zero, then pı, pz, pa would be dependent.) 
Multiplying the basis vectora by suitable scalars, we can assume that pa = [1, 1, 1]. Then ps = [1, a, Bh 
where 1 # a # p # 1 (to ensure the independence of pi, pa, ps for i = 1,2, 3). 

Now the general second degree equation is 


az? + by? + cz? + fyz tgze they = 0. 


If pi, p2, pa lie on this curve, then a = b = ce = 0. Substituting the coordinates of pa and ps gives 


f+g+h=0, 
f+gla+h/ß=0. 


These equations are independent, so the solution is unique up to scalar multiple (and so defines a 


unique conic). as ; 
Wy canic) arguments show that the number of choices of five points in PG(2, ¢) with no three 


collinear is 


l +.¢+1).(g? + 4).97-(9 — 1)?.(@— 2)(4 — 3). 


A conic has q + 1 points with no three collinear; five points can be chosen in 
(a+ 1).¢-(¢ — 1).(¢ - 2)-(9 - 3) 


ways. Division gives the result. 
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CHAPTER 10, Exercisz 1. Let a;j and b;; be the heights of the soldiers in row i and column j after 
the first and second rearrangements. Then dij > a;41;. Also, aj; = baj if and only if a;; is the k? 
largest number in column j. Suppose that 6; < bi+ıj, and let z be a number lying between these 
two values. Then fewer than i soldiers in column j have heights exceeding z, but i or more soldiers 
in column j + 1 have heights exceeding x. But this is a contradiction, since each soldier in column j 
(before the second rearrangement) is taller than his neighbour in column j+l. ? 


CHAPTER 10, EXERCISE 6. It is important to justify the ‘by symmetry ...' in t i ing i 
unchanged if we add a fixed residue mod 17 to everything, or if wer multiply cverythicg byt) Lott 
or +8 (these are all the quadratic residues mod 17), Suppose that there is a red 4-set {a,b ¢ dj. By 
adding ~a, we can assume that a = 0; by multiplying by 1/8, we can assume that b = 1. Now the 
red neighbours of 0 are 1, 2, 4, 8, 9, 13, 15, 16; the red neighbours of 1 are 0, 2, 3, 5, 9, 10, 14, 16. So 
c and d are chosen from 2, 9 and 16. But all edges between these three points are blue. a 

Furthermore, multiplication by any fixed quadratic non-residue maps red edges to blue ones and 
vice versa, So, if there were a blue 4-set, there would be a red one as well. 


CHAPTER 11, Exercise 1. (a) 13; (b) 10; (c) 3; (d) 4; (e) 8; (£) 13. 


CHAPTER 11, Exercisz 13. Let A; be the set of neighbours of i. By assumption, |A; N A;| = 1 for 
if j If |Ai| = n — 1, then 1 is joined to all other vertices. Now the remaining ‘vertices are paired 
up, since any friend of i (other than 1) is a common friend of 1 and i; the graph is a windmill. 

If this doesn’t happen, then |A;| = k for all i, by the De Bruijn-Erdõa theorem. Now the 
adjacency matrix A satisfies A? = (k — 1) + J, where J is the all-one matrix. It has the eigenvalue 
k with multiplicity 1, corresponding to the all-1 eigenvector; any other eigenvalue a satisfi 
a? = k — 1, so the eigenvalues are +/k — 1, with multiplicities $ and g, sa; The t f Ais zero, 
often tet g, say. The trace of A is zero, 

From the last equation, it is impossible that f = g. So k = u? + 1 for some integer u, and 
f- g = —(u? + 1)/u. Since this is an integer, we have u = 1, whence k = 2 and the gra h isa 
triangle (which is a special case of a windmill). 
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surjective mappings, 77 

Sylvester type (Hadamard matrix), 268, 270 
aymmetric design, 264 
symmetric difference, 16 
symmetric function, 220 
symmetric group, 70, 229 
symmetric polynomial, 220-224, 255-256 
symmetric relation, 35, 160 
syndrome decoding, 283 
syndrome, 283 
system of distinct representatives, 88-90, 98, 

185, 203, 270, 292, 318-319, 324 


T 


t-design, 257, passim 
t-transitive, 238 
tableau, 217, passim 
Tall, D. O., 286 
tangent, 139 

target, 173 

Tarry, G., 3, 286 
tautology, 194 
tensor product, 268 
ternary golay code, 287 
Thas, J. A., 336 


Index 


Todd-Coxeter algorithm, 333 
topological graph theory, 299-304 
topology, 1, 37-38, 177 

torus, 302 

total order, 36, 188 

totient, 251 

tournament schedule, 119, 299 
‘Trackwords’, 186 

trail, 161 

transfinite induction, 310 

transitive group, 232 

transitive relation, 35, 187 

Travelling Salesman Problem, 170, 185 
tree, 38-39, 86, 162 

tree, binary, 72 

trek, 161 

triangle inequality, 171-172, 180 
trivalent, 162 

trivial design, 265 

truth table, 194 

Tsuzuku, T., 336 

Turing machine, 326 

‘Twenty Questions’, 67, 72, 271 
Twice-round-the-tree Algorithm, 171, 185 
two-level poset, 207 

two-line notation for permutation, 29 


U 


union, 16 

unique factorisation, 64 
universal graph, 322-323 
unlabelled structure, 14, 234 
unordered pair, 17 


V 


valency, 162 

valuation, 194 

value of flow, 173 

van der Waerden permanent conjecture, 94 
van Lint, J. H., 335, 336, 337 

variance trick, 261 

Varshamov-Gilbert bound, 278-279, 281-282 
Veblen- Young Theorem, 130 

Venn diagram, 76 

vertebrate, 38 

vertex, 159 

vertex-weighted graph, 162 

Vilenkin, N. Ya, 307, 310 

Vizing’s Theorem, 298 

von Neumann, J., 21 


WwW 

walk, 160 

Watkins, J. J., 304 

weakly connected digraph, 173 
weight function, 162, 249 
weight, of codeword, 281 
Welsh, D. J. A., 336, 337 
Wên, King, 42 

Wharton, W., 123 

White, A. T., 336 

Whitehead, J. H. C., 1 
widgets, 170 

Wielandt, H., 336 

Wilson, R. J., 27, 166, 291, 304, 337 


Wilson, R. M., 335 

windmill, 186 

Witt, E., 288 

word, 33 

wreath product, 229-230, 253, 288, 290 


X 
Y 


Youden square, 270 
Young diagram, 210 
Young tableau, 217 


Z 


Zorn’s Lemma, 55, 313-314, 323 


NOWIWVD 


| 1 
A L LI 
3118? 02} 802 374 


{mm a Ço E o o oo l ens mn coh ihn Site wal T E 
mmeweE eae a w co m = e obe ms on e em o o o 


TOPICS 
TECHNIQUES 
ALGORITHMS 


PETER J. CAMERON 


SY) 
a 
z 
O 
= 
<L 
= 
pO 
= 
a 
© 


Combinatorics is a subject of increasing importance, owing 
to its links with computer science, statistics and algebra. This 
is a textbook aimed at second-year undergraduates to 
beginning graduates. It stresses common techniques (such as 
generating functions and recursive construction) which 
underlie the great variety of subject matter and also stresses 
the fact that a constructive or algorithmic proof is more 
valuable than an existence proof. 


The book is divided into two parts, the second at a higher 
level and with a wider range than the first. Historical notes 
are included which give a wider perspective on the subject. 
More advanced topics are given as projects and there are a 
number of exercises, some with solutions given. 
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